Mind mappings: enabling efficient algorithm-accelerator mapping space search

Authors:
Kartik Hegde

University of Illinois at Urbana-Champaign, USA

University of Illinois at Urbana-Champaign, USA
View Profile

,
Po-An Tsai

NVIDIA, USA

NVIDIA, USA
View Profile

,
Sitao Huang

University of Illinois at Urbana-Champaign, USA

University of Illinois at Urbana-Champaign, USA
View Profile

,
Vikas Chandra

Facebook, USA

Facebook, USA
View Profile

,
Angshuman Parashar

NVIDIA, USA

NVIDIA, USA
View Profile

,
Christopher W. Fletcher

University of Illinois at Urbana-Champaign, USA

University of Illinois at Urbana-Champaign, USA
View Profile

ASPLOS '21: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating SystemsApril 2021Pages 943–958https://doi.org/10.1145/3445814.3446762

Published:17 April 2021Publication History

ASPLOS '21: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

Pages 943–958

ABSTRACT

Modern day computing increasingly relies on specialization to satiate growing performance and efficiency requirements. A core challenge in designing such specialized hardware architectures is how to perform mapping space search, i.e., search for an optimal mapping from algorithm to hardware. Prior work shows that choosing an inefficient mapping can lead to multiplicative-factor efficiency overheads. Additionally, the search space is not only large but also non-convex and non-smooth, precluding advanced search techniques. As a result, previous works are forced to implement mapping space search using expert choices or sub-optimal search heuristics.

This work proposes Mind Mappings, a novel gradient-based search method for algorithm-accelerator mapping space search. The key idea is to derive a smooth, differentiable approximation to the otherwise non-smooth, non-convex search space. With a smooth, differentiable approximation, we can leverage efficient gradient-based search algorithms to find high-quality mappings. We extensively compare Mind Mappings to black-box optimization schemes used in prior work. When tasked to find mappings for two important workloads (CNN and MTTKRP), Mind Mapping finds mappings that achieve an average 1.40×, 1.76×, and 1.29× (when run for a fixed number of steps) and 3.16×, 4.19×, and 2.90× (when run for a fixed amount of time) better energy-delay product (EDP) relative to Simulated Annealing, Genetic Algorithms and Reinforcement Learning, respectively. Meanwhile, Mind Mappings returns mappings with only 5.32× higher EDP than a possibly unachievable theoretical lower-bound, indicating proximity to the global optima.

References

Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jefrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geofrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. https://www.tensorflow.org/ Software available from tensorflow. org.Google Scholar
Andrew Adams, Karima Ma, Luke Anderson, Riyadh Baghdadi, Tzu-Mao Li, Michaël Gharbi, Benoit Steiner, Steven Johnson, Kayvon Fatahalian, Frédo Durand, et al. 2019. Learning to optimize halide with tree search and random programs. ACM Transactions on Graphics (TOG) 38, 4 ( 2019 ), 1-12.Google ScholarDigital Library
Byung Hoon Ahn, Prannoy Pilligundla, and Hadi Esmaeilzadeh. 2019. Reinforcement Learning and Adaptive Sampling for Optimized DNN Compilation. arXiv preprint arXiv: 1905. 12799 ( 2019 ).Google Scholar
Jorge Albericio, Patrick Judd, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, and Andreas Moshovos. 2016. Cnvlutin: inefectual-neuron-free deep neural network computing. In Proceedings of the 43rd International Symposium on Computer Architecture. 1-13.Google ScholarDigital Library
Jason Ansel, Shoaib Kamil, Kalyan Veeramachaneni, Jonathan Ragan-Kelley, Jefrey Bosboom, Una-May O'Reilly, and Saman Amarasinghe. 2014. OpenTuner: An Extensible Framework for Program Autotuning. In International Conference on Parallel Architectures and Compilation Techniques. Edmonton, Canada. http://groups.csail.mit.edu/commit/papers/2014/ansel-pact14-opentuner.pdfGoogle ScholarDigital Library
Charles Audet, J Denni, Douglas Moore, Andrew Booker, and Paul Frank. 2000. A surrogate-model-based method for constrained optimization. In 8th symposium on multidisciplinary analysis and optimization.Google ScholarCross Ref
Riyadh Baghdadi, Jessica Ray, Malek Ben Romdhane, Emanuele Del Sozzo, Abdurrahman Akkas, Yunming Zhang, Patricia Suriana, Shoaib Kamil, and Saman Amarasinghe. 2019. Tiramisu: A polyhedral compiler for expressing fast and portable code. In Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization. IEEE Press, 193-205.Google ScholarCross Ref
Richard Bellman. 1957. A Markovian decision process. Journal of mathematics and mechanics ( 1957 ), 679-684.Google Scholar
Eliot Bolduc, George C Knee, Erik M Gauger, and Jonathan Leach. 2017. Projected gradient descent algorithms for quantum state tomography. npj Quantum Information 3, 1 ( 2017 ), 1-9.Google Scholar
Justin A Boyan and Andrew W Moore. 1995. Generalization in reinforcement learning: Safely approximating the value function. In Advances in neural information processing systems. 369-376.Google Scholar
J Douglas Carroll and Jih-Jie Chang. 1970. Analysis of individual diferences in multidimensional scaling via an N-way generalization of ?Eckart-Young? decomposition. Psychometrika 35, 3 ( 1970 ), 283-319.Google Scholar
Prasanth Chatarasi, Hyoukjun Kwon, Natesh Raina, Saurabh Malik, Vaisakh Haridas, Tushar Krishna, and Vivek Sarkar. 2020. MARVEL: A Decoupled Model-driven Approach for Eficiently Mapping Convolutions on Spatial DNN Accelerators. arXiv preprint arXiv: 2002. 07752 ( 2020 ).Google Scholar
Stephen Chen, James Montgomery, and Antonio Bolufé-Röhler. 2015. Measuring the curse of dimensionality and its efects on particle swarm optimization and diferential evolution. Applied Intelligence 42, 3 ( 2015 ), 514-526.Google Scholar
Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. ACM, 785-794.Google ScholarDigital Library
Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, et al. 2018. TVM: An automated end-to-end optimizing compiler for deep learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). 578-594.Google Scholar
Tianshi Chen, Ke Tang, Guoliang Chen, and Xin Yao. 2012. A large population size can be unhelpful in evolutionary algorithms. Theoretical Computer Science 436 ( 2012 ), 54-70.Google Scholar
Wenzheng Chen, Parsa Mirdehghan, Sanja Fidler, and Kiriakos N Kutulakos. 2020. Auto-Tuning Structured Light by Optical Stochastic Gradient Descent. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Google ScholarCross Ref
Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, et al. 2014. Dadiannao: A machinelearning supercomputer. In 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 609-622.Google ScholarDigital Library
Yudong Chen and Martin J Wainwright. 2015. Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees. arXiv preprint arXiv:1509.03025 ( 2015 ).Google Scholar
Yu-Hsin Chen, Joel Emer, and Vivienne Sze. 2016. Eyeriss: A Spatial Architecture for Energy-Eficient Dataflow for Convolutional Neural Networks. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). IEEE, 367-379.Google ScholarDigital Library
Yu-Hsin Chen, Tien-Ju Yang, Joel Emer, and Vivienne Sze. 2019. Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 9, 2 ( 2019 ), 292-308.Google ScholarCross Ref
Jason Chiang, Michael Studniberg, Jack Shaw, Stephen Seto, and Kevin Truong. 2006. Hardware accelerator for genomic sequence alignment. In 2006 International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE, 5787-5789.Google ScholarCross Ref
Jason Chiang, Michael Studniberg, Jack Shaw, Stephen Seto, and Kevin Truong. 2006. Hardware accelerator for genomic sequence alignment. In 2006 International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE, 5787-5789.Google ScholarCross Ref
Shail Dave, Youngbin Kim, Sasikanth Avancha, Kyoungwoo Lee, and Aviral Shrivastava. 2019. DMazerunner: Executing perfectly nested loops on dataflow accelerators. ACM Transactions on Embedded Computing Systems (TECS) 18, 5s ( 2019 ), 1-27.Google ScholarDigital Library
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248-255.Google ScholarCross Ref
Zidong Du, Robert Fasthuber, Tianshi Chen, Paolo Ienne, Ling Li, Tao Luo, Xiaobing Feng, Yunji Chen, and Olivier Temam. 2015. ShiDianNao: Shifting vision processing closer to the sensor. In ACM SIGARCH Computer Architecture News, Vol. 43. ACM, 92-104.Google ScholarDigital Library
Hans Eberle, Nils Gura, Daniel Finchelstein, Sheueling Chang-Shantz, and Vipul Gupta. 2009. Hardware accelerator for elliptic curve cryptography. US Patent 7, 508, 936.Google Scholar
Félix-Antoine Fortin, François-Michel De Rainville, Marc-André Gardner, Marc Parizeau, and Christian Gagné. 2012. DEAP: Evolutionary Algorithms Made Easy. Journal of Machine Learning Research 13 (july 2012 ), 2171-2175.Google ScholarDigital Library
David E Goldberg. 2006. Genetic algorithms. Pearson Education India.Google Scholar
Daniel Golovin, Benjamin Solnik, Subhodeep Moitra, Greg Kochanski, John Karro, and D Sculley. 2017. Google vizier: A service for black-box optimization. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1487-1495.Google ScholarDigital Library
Will Grathwohl, Dami Choi, Yuhuai Wu, Geofrey Roeder, and David Duvenaud. 2017. Backpropagation through the void: Optimizing control variates for blackbox gradient estimation. arXiv preprint arXiv:1711.00123 ( 2017 ).Google Scholar
Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A Horowitz, and William J Dally. 2016. EIE: Eficient Inference Engine on Compressed Deep Neural Network. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). IEEE Computer Society, 243-254.Google ScholarDigital Library
Song Han, Huizi Mao, and William J Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and hufman coding. arXiv preprint arXiv:1510.00149 ( 2015 ).Google Scholar
Ahmad Hassanat, Khalid Almohammadi, Esra' Alkafaween, Eman Abunawas, Awni Hammouri, and VB Prasath. 2019. Choosing Mutation and Crossover Ratios for Genetic Algorithms-A Review with a New Dynamic Approach. Information 10, 12 ( 2019 ), 390.Google Scholar
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770-778.Google ScholarCross Ref
Kartik Hegde, Rohit Agrawal, Yulun Yao, and Christopher W Fletcher. 2018. Morph: Flexible Acceleration for 3D CNN-based Video Understanding. In 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 933-946.Google ScholarDigital Library
Kartik Hegde, Hadi Asghari-Moghaddam, Michael Pellauer, Neal Crago, Aamer Jaleel, Edgar Solomonik, Joel Emer, and Christopher W Fletcher. 2019. ExTensor: An Accelerator for Sparse Tensor Algebra. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. 319-333.Google ScholarDigital Library
Kartik Hegde, Jiyong Yu, Rohit Agrawal, Mengjia Yan, Michael Pellauer, and Christopher Fletcher. 2018. Ucnn: Exploiting computational reuse in deep neural networks via weight repetition. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, 674-687.Google ScholarDigital Library
John Henry Holland et al. 1992. Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. MIT press.Google Scholar
Peter J Huber. 1992. Robust estimation of a location parameter. In Breakthroughs in statistics. Springer, 492-518.Google Scholar
Tassadaq Hussain, Oscar Palomar, Adrian Cristal, Eduard Ayguadé, and Amna Haider. 2015. ViPS: Visual processing system for medical imaging. In 2015 8th International Conference on Biomedical Engineering and Informatics (BMEI). IEEE, 40-45.Google ScholarCross Ref
Engin Ïpek, Sally A McKee, Rich Caruana, Bronis R de Supinski, and Martin Schulz. 2006. Eficiently exploring architectural design spaces via predictive modeling. ACM SIGOPS Operating Systems Review 40, 5 ( 2006 ), 195-206.Google ScholarDigital Library
Norman P Jouppi, Clif Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th annual international symposium on computer architecture. 1-12.Google ScholarDigital Library
Sheng-Chun Kao and Tushar Krishna. 2020. GAMMA: automating the HW mapping of DNN models on accelerators via genetic algorithm. In 2020 IEEE/ACM International Conference On Computer Aided Design (ICCAD). IEEE.Google ScholarDigital Library
Scott Kirkpatrick, C Daniel Gelatt, and Mario P Vecchi. 1983. Optimization by simulated annealing. science 220, 4598 ( 1983 ), 671-680.Google Scholar
Robert Kleinberg, Yuanzhi Li, and Yang Yuan. 2018. An alternative view: When does SGD escape local minima? arXiv preprint arXiv: 1802. 06175 ( 2018 ).Google Scholar
Tamara G Kolda and Brett W Bader. 2009. Tensor decompositions and applications. SIAM review 51, 3 ( 2009 ), 455-500.Google Scholar
Vijay R Konda and John N Tsitsiklis. 2000. Actor-critic algorithms. In Advances in neural information processing systems. 1008-1014.Google Scholar
Slawomir Koziel and Leifur Leifsson. 2013. Surrogate-based modeling and optimization. Springer.Google Scholar
Alex Krizhevsky, Ilya Sutskever, and Geofrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25 ( 2012 ), 1097-1105.Google Scholar
Hyoukjun Kwon, Prasanth Chatarasi, Michael Pellauer, Angshuman Parashar, Vivek Sarkar, and Tushar Krishna. 2019. Understanding Reuse, Performance, and Hardware Cost of DNN Dataflow: A Data-Centric Approach. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 754-768.Google ScholarDigital Library
Hyoukjun Kwon, Ananda Samajdar, and Tushar Krishna. 2018. MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects. SIGPLAN Not. 53, 2 (March 2018 ), 461-475. https://doi.org/10. 1145/3296957.3173176 Google ScholarDigital Library
Yann LeCun, D Touresky, G Hinton, and T Sejnowski. 1988. A theoretical framework for back-propagation. In Proceedings of the 1988 connectionist models summer school, Vol. 1. CMU, Pittsburgh, Pa: Morgan Kaufmann, 21-28.Google Scholar
Yann A. LeCun, Léon Bottou, Genevieve B. Orr, and Klaus-Robert Müller. 2012. Eficient BackProp. Springer Berlin Heidelberg, Berlin, Heidelberg, 9-48. https: //doi.org/10.1007/978-3-642-35289-8_3 Google ScholarCross Ref
Benjamin C Lee and David M Brooks. 2007. Illustrative design space studies with microarchitectural regression models. In 2007 IEEE 13th International Symposium on High Performance Computer Architecture. IEEE, 340-351.Google ScholarDigital Library
Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2016. Continuous control with deep reinforcement learning. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings. http://arxiv.org/abs/1509.02971Google Scholar
Hanxiao Liu, Karen Simonyan, and Yiming Yang. 2018. Darts: Diferentiable architecture search. arXiv preprint arXiv: 1806. 09055 ( 2018 ).Google Scholar
Gilles Louppe, Joeri Hermans, and Kyle Cranmer. 2017. Adversarial variational optimization of non-diferentiable simulators. arXiv preprint arXiv:1707.07113 ( 2017 ).Google Scholar
Wenyan Lu, Guihai Yan, Jiajun Li, Shijun Gong, Yinhe Han, and Xiaowei Li. 2017. Flexflow: A flexible dataflow accelerator architecture for convolutional neural networks. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 553-564.Google ScholarCross Ref
Chris J Maddison, Andriy Mnih, and Yee Whye Teh. 2016. The concrete distribution: A continuous relaxation of discrete random variables. arXiv preprint arXiv:1611.00712 ( 2016 ).Google Scholar
Sanu Mathew, Sudhir Satpathy, Vikram Suresh, Mark Anders, Himanshu Kaul, Amit Agarwal, Steven Hsu, Gregory Chen, and Ram Krishnamurthy. 2015. 340 mV-1.1 V, 289 Gbps/W, 2090-gate nanoAES hardware accelerator with areaoptimized encrypt/decrypt GF (2 4) 2 polynomials in 22 nm tri-gate CMOS. IEEE Journal of Solid-State Circuits 50, 4 ( 2015 ), 1048-1058.Google ScholarCross Ref
Charith Mendis, Alex Renda, Saman Amarasinghe, and Michael Carbin. 2018. Ithemal: Accurate, portable and fast basic block throughput estimation using deep neural networks. arXiv preprint arXiv:1808. 07412 ( 2018 ).Google Scholar
Tomas Mikolov, Kai Chen, Gregory S. Corrado, and Jefrey Dean. 2013. Eficient Estimation of Word Representations in Vector Space. CoRR abs/1301.3781 ( 2013 ).Google Scholar
Anjum A Mohammed and Gihan Nagib. 2012. Optimal routing in ad-hoc network using genetic algorithm. Int. J. Advanced Networking and Applications 3, 05 ( 2012 ), 1323-1328.Google Scholar
Yurii Nesterov. 2013. Introductory lectures on convex optimization: A basic course. Vol. 87. Springer Science & Business Media.Google ScholarDigital Library
NVIDIA. [n.d.]. The NVIDIA Deep Learning Accelerator (NVDLA). http://nvdla.org/hw/v1/ias/programming_guide.html.Google Scholar
Hari Mohan Pandey, Ankit Chaudhary, and Deepti Mehrotra. 2014. A comparative review of approaches to prevent premature convergence in GA. Applied Soft Computing 24 ( 2014 ), 1047-1077.Google Scholar
Angshuman Parashar, Priyanka Raina, Yakun Sophia Shao, Yu-Hsin Chen, Victor A Ying, Anurag Mukkara, Rangharajan Venkatesan, Brucek Khailany, Stephen W Keckler, and Joel Emer. 2019. Timeloop: A Systematic Approach to DNN Accelerator Evaluation. In 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 304-315.Google ScholarCross Ref
Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W Keckler, and William J Dally. 2017. SCNN: An accelerator for compressed-sparse convolutional neural networks. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA). IEEE, 27-40.Google ScholarDigital Library
Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. 2012. Understanding the exploding gradient problem. CoRR, abs/1211.5063 2 ( 2012 ).Google Scholar
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems. 8024-8035.Google Scholar
Tirthak Patel and Devesh Tiwari. 2020. CLITE: Eficient and QoS-Aware CoLocation of Multiple Latency-Critical Jobs for Warehouse Scale Computers. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 193-206.Google Scholar
Michael Pellauer, Yakun Sophia Shao, Jason Clemons, Neal Crago, Kartik Hegde, Rangharajan Venkatesan, Stephen W Keckler, Christopher W Fletcher, and Joel Emer. 2019. Bufets: An eficient and composable storage idiom for explicit decoupled data orchestration. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. 137-151.Google ScholarDigital Library
Matthew Perry. 2019. Python module for simulated annealing. https://github. com/perrygeo/simanneal.Google Scholar
Nestor V Queipo, Raphael T Haftka, Wei Shyy, Tushar Goel, Rajkumar Vaidyanathan, and P Kevin Tucker. 2005. Surrogate-based analysis and optimization. Progress in aerospace sciences ( 2005 ).Google Scholar
Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In Acm Sigplan Notices, Vol. 48. ACM, 519-530.Google ScholarDigital Library
Brandon Reagen, José Miguel Hernández-Lobato, Robert Adolf, Michael Gelbart, Paul Whatmough, Gu-Yeon Wei, and David Brooks. 2017. A case for eficient accelerator design space exploration via Bayesian optimization. In 2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED). IEEE, 1-6.Google ScholarCross Ref
Alex Renda, Yishen Chen, Charith Mendis, and Michael Carbin. 2020. Dif Tune: Optimizing CPU Simulator Parameters with Learned Diferentiable Surrogates. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE.Google Scholar
Raanan Y Rohekar, Shami Nisimov, Yaniv Gurwicz, Guy Koren, and Gal Novik. 2018. Constructing deep neural networks by Bayesian network structure learning. In Advances in Neural Information Processing Systems. 3047-3058.Google Scholar
Ananda Samajdar, Yuhao Zhu, Paul Whatmough, Matthew Mattina, and Tushar Krishna. 2018. Scale-sim: Systolic cnn accelerator. arXiv preprint arXiv: 1811. 02883 ( 2018 ).Google Scholar
Sergey Shirobokov, Vladislav Belavin, Michael Kagan, Andrei Ustyuzhanin, and Atilim Gunes Baydin. 2020. Black-box optimization with local generative surrogates. In Workshop on Real World Experiment Design and Active Learning at International Conference on Machine Learning.Google Scholar
Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR abs/1409.1556 ( 2014 ). http://arxiv.org/abs/1409.1556Google Scholar
Age Smilde, Rasmus Bro, and Paul Geladi. 2005. Multi-way analysis: applications in the chemical sciences. John Wiley & Sons.Google Scholar
Selmar K Smit and AE Eiben. 2010. Parameter tuning of evolutionary algorithms: Generalist vs. specialist. In European conference on the applications of evolutionary computation. Springer, 542-551.Google ScholarDigital Library
Shaden Smith, Jongsoo Park, and George Karypis. 2017. Sparse tensor factorization on many-core processors with high-bandwidth memory. In 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 1058-1067.Google ScholarCross Ref
Nitish Srivastava, Geofrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15, 1 ( 2014 ), 1929-1958.Google Scholar
Nitish Srivastava, Hanchen Jin, Shaden Smith, Hongbo Rong, David Albonesi, and Zhiru Zhang. 2020. Tensaurus: A versatile accelerator for mixed sparsedense tensor computations. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 689-702.Google ScholarCross Ref
Praveen Ranjan Srivastava and Tai-hoon Kim. 2009. Application of genetic algorithm in software testing. International Journal of software Engineering and its Applications 3, 4 ( 2009 ), 87-96.Google Scholar
Rainer Storn and Kenneth Price. 1997. Diferential evolution-a simple and eficient heuristic for global optimization over continuous spaces. Journal of global optimization 11, 4 ( 1997 ), 341-359.Google ScholarDigital Library
Richard S Sutton, David A McAllester, Satinder P Singh, and Yishay Mansour. 2000. Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems. 1057-1063.Google Scholar
Richard S Sutton, David A McAllester, Satinder P Singh, Yishay Mansour, et al. 1999. Policy gradient methods for reinforcement learning with function approximation.. In NIPs, Vol. 99. Citeseer, 1057-1063.Google ScholarDigital Library
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1-9.Google ScholarCross Ref
G Tomasi. 2005. Use of the properties of the Khatri-Rao product for the computation of Jacobian. Hessian, and gradient of the PARAFAC model under MATLAB ( 2005 ).Google Scholar
Ethan Tseng, Felix Yu, Yuting Yang, Fahim Mannan, Karl ST Arnaud, Derek Nowrouzezahrai, Jean-François Lalonde, and Felix Heide. 2019. Hyperparameter optimization in black-box image processing using diferentiable proxies. ACM Transactions on Graphics ( 2019 ).Google Scholar
Fengbin Tu, Shouyi Yin, Peng Ouyang, Shibin Tang, Leibo Liu, and Shaojun Wei. 2017. Deep convolutional neural network architecture with reconfigurable computation patterns. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 25, 8 ( 2017 ), 2220-2233.Google ScholarDigital Library
George Tucker, Andriy Mnih, Chris J Maddison, Dieterich Lawson, and Jascha Sohl-Dickstein. 2017. Rebar: Low-variance, unbiased gradient estimates for discrete latent variable models. arXiv preprint arXiv:1703.07370 ( 2017 ).Google Scholar
Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S Moses, Sven Verdoolaege, Andrew Adams, and Albert Cohen. 2018. Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions. arXiv preprint arXiv: 1802. 04730 ( 2018 ).Google Scholar
Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, and Song Han. 2019. Haq: Hardwareaware automated quantization with mixed precision. In Proceedings of the IEEE conference on computer vision and pattern recognition. 8612-8620.Google ScholarCross Ref
Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, and Kurt Keutzer. 2019. Fbnet: Hardware-aware eficient convnet design via diferentiable neural architecture search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 10734-10742.Google ScholarCross Ref
Amir Yazdanbakhsh, Kambiz Samadi, Nam Sung Kim, and Hadi Esmaeilzadeh. 2018. Ganax: A unified mimd-simd acceleration for generative adversarial networks. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, 650-661.Google ScholarDigital Library
Jiecao Yu, Andrew Lukefahr, David Palframan, Ganesh Dasika, Reetuparna Das, and Scott Mahlke. 2017. Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism. In Proceedings of the 44th Annual International Symposium on Computer Architecture. 548-560.Google ScholarDigital Library
Size Zheng, Yun Liang, Shuo Wang, Renze Chen, and Kaiwen Sheng. 2020. FlexTensor: An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System. In Proceedings of the TwentyFifth International Conference on Architectural Support for Programming Languages and Operating Systems. 859-873.Google ScholarDigital Library
Barret Zoph and Quoc V Le. 2016. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578 ( 2016 ).Google Scholar

Index Terms

Mind mappings: enabling efficient algorithm-accelerator mapping space search
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Special purpose systems
2. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Use of mind mapping in search process to clarify information needs and improve search satisfaction

A mind map is an approach to the organisation of the human mind that prepares the ground for thinking. Inspired by the function of the mind in handling a situation, this article reports on an empirical study that evaluated the efficiency of mind map ...
Read More
A Fast Approximate Nearest Neighbor Search Algorithm in the Hamming Space

A fast approximate nearest neighbor search algorithm for the (binary) Hamming space is proposed. The proposed Error Weighted Hashing (EWH) algorithm is up to 20 times faster than the popular locality sensitive hashing (LSH) algorithm and works well even ...
Read More
I/O Efficient Algorithm for c-Approximate Furthest Neighbor Search in High-Dimensional Space
Database Systems for Advanced Applications
Abstract
Furthest Neighbor search in high-dimensional space has been widely used in many applications such as recommendation systems. Because of the “curse of dimensionality” problem, c-approximate furthest neighbor (C-AFN) is a substitute as a trade-off ...
Read More

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ASPLOS '21: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems
April 2021
1090 pages
ISBN:9781450383172
DOI:10.1145/3445814
General Chair:
Tim Sherwood
University of California at Santa Barbara, USA
,
Program Chairs:
Emery Berger
University of Massachusetts at Amherst, USA
,
Christos Kozyrakis
Stanford University, USA
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 April 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
gradient-based search
mapping space search
programmable domain-specific accelerators
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate535of2,713submissions,20%
Upcoming Conference
ASPLOS '25

Sponsor:

sigarch

sigarch

sigarch

29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

March 30 - April 3, 2025

Rotterdam , Netherlands
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 26
  Total Citations
  View Citations
- 1,572
  Total Downloads
- Downloads (Last 12 months)556
- Downloads (Last 6 weeks)65
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Mind mappings: enabling efficient algorithm-accelerator mapping space search

ASPLOS '21: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Use of mind mapping in search process to clarify information needs and improve search satisfaction

A Fast Approximate Nearest Neighbor Search Algorithm in the Hamming Space

I/O Efficient Algorithm for c-Approximate Furthest Neighbor Search in High-Dimensional Space