[go: up one dir, main page]

skip to main content
10.1145/3445814.3446762acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article
Public Access

Mind mappings: enabling efficient algorithm-accelerator mapping space search

Published:17 April 2021Publication History

ABSTRACT

Modern day computing increasingly relies on specialization to satiate growing performance and efficiency requirements. A core challenge in designing such specialized hardware architectures is how to perform mapping space search, i.e., search for an optimal mapping from algorithm to hardware. Prior work shows that choosing an inefficient mapping can lead to multiplicative-factor efficiency overheads. Additionally, the search space is not only large but also non-convex and non-smooth, precluding advanced search techniques. As a result, previous works are forced to implement mapping space search using expert choices or sub-optimal search heuristics.

This work proposes Mind Mappings, a novel gradient-based search method for algorithm-accelerator mapping space search. The key idea is to derive a smooth, differentiable approximation to the otherwise non-smooth, non-convex search space. With a smooth, differentiable approximation, we can leverage efficient gradient-based search algorithms to find high-quality mappings. We extensively compare Mind Mappings to black-box optimization schemes used in prior work. When tasked to find mappings for two important workloads (CNN and MTTKRP), Mind Mapping finds mappings that achieve an average 1.40×, 1.76×, and 1.29× (when run for a fixed number of steps) and 3.16×, 4.19×, and 2.90× (when run for a fixed amount of time) better energy-delay product (EDP) relative to Simulated Annealing, Genetic Algorithms and Reinforcement Learning, respectively. Meanwhile, Mind Mappings returns mappings with only 5.32× higher EDP than a possibly unachievable theoretical lower-bound, indicating proximity to the global optima.

References

  1. Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jefrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geofrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. https://www.tensorflow.org/ Software available from tensorflow. org.Google ScholarGoogle Scholar
  2. Andrew Adams, Karima Ma, Luke Anderson, Riyadh Baghdadi, Tzu-Mao Li, Michaël Gharbi, Benoit Steiner, Steven Johnson, Kayvon Fatahalian, Frédo Durand, et al. 2019. Learning to optimize halide with tree search and random programs. ACM Transactions on Graphics (TOG) 38, 4 ( 2019 ), 1-12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Byung Hoon Ahn, Prannoy Pilligundla, and Hadi Esmaeilzadeh. 2019. Reinforcement Learning and Adaptive Sampling for Optimized DNN Compilation. arXiv preprint arXiv: 1905. 12799 ( 2019 ).Google ScholarGoogle Scholar
  4. Jorge Albericio, Patrick Judd, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, and Andreas Moshovos. 2016. Cnvlutin: inefectual-neuron-free deep neural network computing. In Proceedings of the 43rd International Symposium on Computer Architecture. 1-13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Jason Ansel, Shoaib Kamil, Kalyan Veeramachaneni, Jonathan Ragan-Kelley, Jefrey Bosboom, Una-May O'Reilly, and Saman Amarasinghe. 2014. OpenTuner: An Extensible Framework for Program Autotuning. In International Conference on Parallel Architectures and Compilation Techniques. Edmonton, Canada. http://groups.csail.mit.edu/commit/papers/2014/ansel-pact14-opentuner.pdfGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  6. Charles Audet, J Denni, Douglas Moore, Andrew Booker, and Paul Frank. 2000. A surrogate-model-based method for constrained optimization. In 8th symposium on multidisciplinary analysis and optimization.Google ScholarGoogle ScholarCross RefCross Ref
  7. Riyadh Baghdadi, Jessica Ray, Malek Ben Romdhane, Emanuele Del Sozzo, Abdurrahman Akkas, Yunming Zhang, Patricia Suriana, Shoaib Kamil, and Saman Amarasinghe. 2019. Tiramisu: A polyhedral compiler for expressing fast and portable code. In Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization. IEEE Press, 193-205.Google ScholarGoogle ScholarCross RefCross Ref
  8. Richard Bellman. 1957. A Markovian decision process. Journal of mathematics and mechanics ( 1957 ), 679-684.Google ScholarGoogle Scholar
  9. Eliot Bolduc, George C Knee, Erik M Gauger, and Jonathan Leach. 2017. Projected gradient descent algorithms for quantum state tomography. npj Quantum Information 3, 1 ( 2017 ), 1-9.Google ScholarGoogle Scholar
  10. Justin A Boyan and Andrew W Moore. 1995. Generalization in reinforcement learning: Safely approximating the value function. In Advances in neural information processing systems. 369-376.Google ScholarGoogle Scholar
  11. J Douglas Carroll and Jih-Jie Chang. 1970. Analysis of individual diferences in multidimensional scaling via an N-way generalization of ?Eckart-Young? decomposition. Psychometrika 35, 3 ( 1970 ), 283-319.Google ScholarGoogle Scholar
  12. Prasanth Chatarasi, Hyoukjun Kwon, Natesh Raina, Saurabh Malik, Vaisakh Haridas, Tushar Krishna, and Vivek Sarkar. 2020. MARVEL: A Decoupled Model-driven Approach for Eficiently Mapping Convolutions on Spatial DNN Accelerators. arXiv preprint arXiv: 2002. 07752 ( 2020 ).Google ScholarGoogle Scholar
  13. Stephen Chen, James Montgomery, and Antonio Bolufé-Röhler. 2015. Measuring the curse of dimensionality and its efects on particle swarm optimization and diferential evolution. Applied Intelligence 42, 3 ( 2015 ), 514-526.Google ScholarGoogle Scholar
  14. Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. ACM, 785-794.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, et al. 2018. TVM: An automated end-to-end optimizing compiler for deep learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). 578-594.Google ScholarGoogle Scholar
  16. Tianshi Chen, Ke Tang, Guoliang Chen, and Xin Yao. 2012. A large population size can be unhelpful in evolutionary algorithms. Theoretical Computer Science 436 ( 2012 ), 54-70.Google ScholarGoogle Scholar
  17. Wenzheng Chen, Parsa Mirdehghan, Sanja Fidler, and Kiriakos N Kutulakos. 2020. Auto-Tuning Structured Light by Optical Stochastic Gradient Descent. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle ScholarCross RefCross Ref
  18. Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, et al. 2014. Dadiannao: A machinelearning supercomputer. In 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 609-622.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Yudong Chen and Martin J Wainwright. 2015. Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees. arXiv preprint arXiv:1509.03025 ( 2015 ).Google ScholarGoogle Scholar
  20. Yu-Hsin Chen, Joel Emer, and Vivienne Sze. 2016. Eyeriss: A Spatial Architecture for Energy-Eficient Dataflow for Convolutional Neural Networks. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). IEEE, 367-379.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Yu-Hsin Chen, Tien-Ju Yang, Joel Emer, and Vivienne Sze. 2019. Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 9, 2 ( 2019 ), 292-308.Google ScholarGoogle ScholarCross RefCross Ref
  22. Jason Chiang, Michael Studniberg, Jack Shaw, Stephen Seto, and Kevin Truong. 2006. Hardware accelerator for genomic sequence alignment. In 2006 International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE, 5787-5789.Google ScholarGoogle ScholarCross RefCross Ref
  23. Jason Chiang, Michael Studniberg, Jack Shaw, Stephen Seto, and Kevin Truong. 2006. Hardware accelerator for genomic sequence alignment. In 2006 International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE, 5787-5789.Google ScholarGoogle ScholarCross RefCross Ref
  24. Shail Dave, Youngbin Kim, Sasikanth Avancha, Kyoungwoo Lee, and Aviral Shrivastava. 2019. DMazerunner: Executing perfectly nested loops on dataflow accelerators. ACM Transactions on Embedded Computing Systems (TECS) 18, 5s ( 2019 ), 1-27.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248-255.Google ScholarGoogle ScholarCross RefCross Ref
  26. Zidong Du, Robert Fasthuber, Tianshi Chen, Paolo Ienne, Ling Li, Tao Luo, Xiaobing Feng, Yunji Chen, and Olivier Temam. 2015. ShiDianNao: Shifting vision processing closer to the sensor. In ACM SIGARCH Computer Architecture News, Vol. 43. ACM, 92-104.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Hans Eberle, Nils Gura, Daniel Finchelstein, Sheueling Chang-Shantz, and Vipul Gupta. 2009. Hardware accelerator for elliptic curve cryptography. US Patent 7, 508, 936.Google ScholarGoogle Scholar
  28. Félix-Antoine Fortin, François-Michel De Rainville, Marc-André Gardner, Marc Parizeau, and Christian Gagné. 2012. DEAP: Evolutionary Algorithms Made Easy. Journal of Machine Learning Research 13 (july 2012 ), 2171-2175.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. David E Goldberg. 2006. Genetic algorithms. Pearson Education India.Google ScholarGoogle Scholar
  30. Daniel Golovin, Benjamin Solnik, Subhodeep Moitra, Greg Kochanski, John Karro, and D Sculley. 2017. Google vizier: A service for black-box optimization. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1487-1495.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Will Grathwohl, Dami Choi, Yuhuai Wu, Geofrey Roeder, and David Duvenaud. 2017. Backpropagation through the void: Optimizing control variates for blackbox gradient estimation. arXiv preprint arXiv:1711.00123 ( 2017 ).Google ScholarGoogle Scholar
  32. Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A Horowitz, and William J Dally. 2016. EIE: Eficient Inference Engine on Compressed Deep Neural Network. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). IEEE Computer Society, 243-254.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Song Han, Huizi Mao, and William J Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and hufman coding. arXiv preprint arXiv:1510.00149 ( 2015 ).Google ScholarGoogle Scholar
  34. Ahmad Hassanat, Khalid Almohammadi, Esra' Alkafaween, Eman Abunawas, Awni Hammouri, and VB Prasath. 2019. Choosing Mutation and Crossover Ratios for Genetic Algorithms-A Review with a New Dynamic Approach. Information 10, 12 ( 2019 ), 390.Google ScholarGoogle Scholar
  35. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770-778.Google ScholarGoogle ScholarCross RefCross Ref
  36. Kartik Hegde, Rohit Agrawal, Yulun Yao, and Christopher W Fletcher. 2018. Morph: Flexible Acceleration for 3D CNN-based Video Understanding. In 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 933-946.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Kartik Hegde, Hadi Asghari-Moghaddam, Michael Pellauer, Neal Crago, Aamer Jaleel, Edgar Solomonik, Joel Emer, and Christopher W Fletcher. 2019. ExTensor: An Accelerator for Sparse Tensor Algebra. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. 319-333.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Kartik Hegde, Jiyong Yu, Rohit Agrawal, Mengjia Yan, Michael Pellauer, and Christopher Fletcher. 2018. Ucnn: Exploiting computational reuse in deep neural networks via weight repetition. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, 674-687.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. John Henry Holland et al. 1992. Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. MIT press.Google ScholarGoogle Scholar
  40. Peter J Huber. 1992. Robust estimation of a location parameter. In Breakthroughs in statistics. Springer, 492-518.Google ScholarGoogle Scholar
  41. Tassadaq Hussain, Oscar Palomar, Adrian Cristal, Eduard Ayguadé, and Amna Haider. 2015. ViPS: Visual processing system for medical imaging. In 2015 8th International Conference on Biomedical Engineering and Informatics (BMEI). IEEE, 40-45.Google ScholarGoogle ScholarCross RefCross Ref
  42. Engin Ïpek, Sally A McKee, Rich Caruana, Bronis R de Supinski, and Martin Schulz. 2006. Eficiently exploring architectural design spaces via predictive modeling. ACM SIGOPS Operating Systems Review 40, 5 ( 2006 ), 195-206.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Norman P Jouppi, Clif Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th annual international symposium on computer architecture. 1-12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Sheng-Chun Kao and Tushar Krishna. 2020. GAMMA: automating the HW mapping of DNN models on accelerators via genetic algorithm. In 2020 IEEE/ACM International Conference On Computer Aided Design (ICCAD). IEEE.Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Scott Kirkpatrick, C Daniel Gelatt, and Mario P Vecchi. 1983. Optimization by simulated annealing. science 220, 4598 ( 1983 ), 671-680.Google ScholarGoogle Scholar
  46. Robert Kleinberg, Yuanzhi Li, and Yang Yuan. 2018. An alternative view: When does SGD escape local minima? arXiv preprint arXiv: 1802. 06175 ( 2018 ).Google ScholarGoogle Scholar
  47. Tamara G Kolda and Brett W Bader. 2009. Tensor decompositions and applications. SIAM review 51, 3 ( 2009 ), 455-500.Google ScholarGoogle Scholar
  48. Vijay R Konda and John N Tsitsiklis. 2000. Actor-critic algorithms. In Advances in neural information processing systems. 1008-1014.Google ScholarGoogle Scholar
  49. Slawomir Koziel and Leifur Leifsson. 2013. Surrogate-based modeling and optimization. Springer.Google ScholarGoogle Scholar
  50. Alex Krizhevsky, Ilya Sutskever, and Geofrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25 ( 2012 ), 1097-1105.Google ScholarGoogle Scholar
  51. Hyoukjun Kwon, Prasanth Chatarasi, Michael Pellauer, Angshuman Parashar, Vivek Sarkar, and Tushar Krishna. 2019. Understanding Reuse, Performance, and Hardware Cost of DNN Dataflow: A Data-Centric Approach. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 754-768.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Hyoukjun Kwon, Ananda Samajdar, and Tushar Krishna. 2018. MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects. SIGPLAN Not. 53, 2 (March 2018 ), 461-475. https://doi.org/10. 1145/3296957.3173176 Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Yann LeCun, D Touresky, G Hinton, and T Sejnowski. 1988. A theoretical framework for back-propagation. In Proceedings of the 1988 connectionist models summer school, Vol. 1. CMU, Pittsburgh, Pa: Morgan Kaufmann, 21-28.Google ScholarGoogle Scholar
  54. Yann A. LeCun, Léon Bottou, Genevieve B. Orr, and Klaus-Robert Müller. 2012. Eficient BackProp. Springer Berlin Heidelberg, Berlin, Heidelberg, 9-48. https: //doi.org/10.1007/978-3-642-35289-8_3 Google ScholarGoogle ScholarCross RefCross Ref
  55. Benjamin C Lee and David M Brooks. 2007. Illustrative design space studies with microarchitectural regression models. In 2007 IEEE 13th International Symposium on High Performance Computer Architecture. IEEE, 340-351.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2016. Continuous control with deep reinforcement learning. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings. http://arxiv.org/abs/1509.02971Google ScholarGoogle Scholar
  57. Hanxiao Liu, Karen Simonyan, and Yiming Yang. 2018. Darts: Diferentiable architecture search. arXiv preprint arXiv: 1806. 09055 ( 2018 ).Google ScholarGoogle Scholar
  58. Gilles Louppe, Joeri Hermans, and Kyle Cranmer. 2017. Adversarial variational optimization of non-diferentiable simulators. arXiv preprint arXiv:1707.07113 ( 2017 ).Google ScholarGoogle Scholar
  59. Wenyan Lu, Guihai Yan, Jiajun Li, Shijun Gong, Yinhe Han, and Xiaowei Li. 2017. Flexflow: A flexible dataflow accelerator architecture for convolutional neural networks. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 553-564.Google ScholarGoogle ScholarCross RefCross Ref
  60. Chris J Maddison, Andriy Mnih, and Yee Whye Teh. 2016. The concrete distribution: A continuous relaxation of discrete random variables. arXiv preprint arXiv:1611.00712 ( 2016 ).Google ScholarGoogle Scholar
  61. Sanu Mathew, Sudhir Satpathy, Vikram Suresh, Mark Anders, Himanshu Kaul, Amit Agarwal, Steven Hsu, Gregory Chen, and Ram Krishnamurthy. 2015. 340 mV-1.1 V, 289 Gbps/W, 2090-gate nanoAES hardware accelerator with areaoptimized encrypt/decrypt GF (2 4) 2 polynomials in 22 nm tri-gate CMOS. IEEE Journal of Solid-State Circuits 50, 4 ( 2015 ), 1048-1058.Google ScholarGoogle ScholarCross RefCross Ref
  62. Charith Mendis, Alex Renda, Saman Amarasinghe, and Michael Carbin. 2018. Ithemal: Accurate, portable and fast basic block throughput estimation using deep neural networks. arXiv preprint arXiv:1808. 07412 ( 2018 ).Google ScholarGoogle Scholar
  63. Tomas Mikolov, Kai Chen, Gregory S. Corrado, and Jefrey Dean. 2013. Eficient Estimation of Word Representations in Vector Space. CoRR abs/1301.3781 ( 2013 ).Google ScholarGoogle Scholar
  64. Anjum A Mohammed and Gihan Nagib. 2012. Optimal routing in ad-hoc network using genetic algorithm. Int. J. Advanced Networking and Applications 3, 05 ( 2012 ), 1323-1328.Google ScholarGoogle Scholar
  65. Yurii Nesterov. 2013. Introductory lectures on convex optimization: A basic course. Vol. 87. Springer Science & Business Media.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. NVIDIA. [n.d.]. The NVIDIA Deep Learning Accelerator (NVDLA). http://nvdla.org/hw/v1/ias/programming_guide.html.Google ScholarGoogle Scholar
  67. Hari Mohan Pandey, Ankit Chaudhary, and Deepti Mehrotra. 2014. A comparative review of approaches to prevent premature convergence in GA. Applied Soft Computing 24 ( 2014 ), 1047-1077.Google ScholarGoogle Scholar
  68. Angshuman Parashar, Priyanka Raina, Yakun Sophia Shao, Yu-Hsin Chen, Victor A Ying, Anurag Mukkara, Rangharajan Venkatesan, Brucek Khailany, Stephen W Keckler, and Joel Emer. 2019. Timeloop: A Systematic Approach to DNN Accelerator Evaluation. In 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 304-315.Google ScholarGoogle ScholarCross RefCross Ref
  69. Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W Keckler, and William J Dally. 2017. SCNN: An accelerator for compressed-sparse convolutional neural networks. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA). IEEE, 27-40.Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. 2012. Understanding the exploding gradient problem. CoRR, abs/1211.5063 2 ( 2012 ).Google ScholarGoogle Scholar
  71. Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems. 8024-8035.Google ScholarGoogle Scholar
  72. Tirthak Patel and Devesh Tiwari. 2020. CLITE: Eficient and QoS-Aware CoLocation of Multiple Latency-Critical Jobs for Warehouse Scale Computers. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 193-206.Google ScholarGoogle Scholar
  73. Michael Pellauer, Yakun Sophia Shao, Jason Clemons, Neal Crago, Kartik Hegde, Rangharajan Venkatesan, Stephen W Keckler, Christopher W Fletcher, and Joel Emer. 2019. Bufets: An eficient and composable storage idiom for explicit decoupled data orchestration. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. 137-151.Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Matthew Perry. 2019. Python module for simulated annealing. https://github. com/perrygeo/simanneal.Google ScholarGoogle Scholar
  75. Nestor V Queipo, Raphael T Haftka, Wei Shyy, Tushar Goel, Rajkumar Vaidyanathan, and P Kevin Tucker. 2005. Surrogate-based analysis and optimization. Progress in aerospace sciences ( 2005 ).Google ScholarGoogle Scholar
  76. Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In Acm Sigplan Notices, Vol. 48. ACM, 519-530.Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Brandon Reagen, José Miguel Hernández-Lobato, Robert Adolf, Michael Gelbart, Paul Whatmough, Gu-Yeon Wei, and David Brooks. 2017. A case for eficient accelerator design space exploration via Bayesian optimization. In 2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED). IEEE, 1-6.Google ScholarGoogle ScholarCross RefCross Ref
  78. Alex Renda, Yishen Chen, Charith Mendis, and Michael Carbin. 2020. Dif Tune: Optimizing CPU Simulator Parameters with Learned Diferentiable Surrogates. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE.Google ScholarGoogle Scholar
  79. Raanan Y Rohekar, Shami Nisimov, Yaniv Gurwicz, Guy Koren, and Gal Novik. 2018. Constructing deep neural networks by Bayesian network structure learning. In Advances in Neural Information Processing Systems. 3047-3058.Google ScholarGoogle Scholar
  80. Ananda Samajdar, Yuhao Zhu, Paul Whatmough, Matthew Mattina, and Tushar Krishna. 2018. Scale-sim: Systolic cnn accelerator. arXiv preprint arXiv: 1811. 02883 ( 2018 ).Google ScholarGoogle Scholar
  81. Sergey Shirobokov, Vladislav Belavin, Michael Kagan, Andrei Ustyuzhanin, and Atilim Gunes Baydin. 2020. Black-box optimization with local generative surrogates. In Workshop on Real World Experiment Design and Active Learning at International Conference on Machine Learning.Google ScholarGoogle Scholar
  82. Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR abs/1409.1556 ( 2014 ). http://arxiv.org/abs/1409.1556Google ScholarGoogle Scholar
  83. Age Smilde, Rasmus Bro, and Paul Geladi. 2005. Multi-way analysis: applications in the chemical sciences. John Wiley & Sons.Google ScholarGoogle Scholar
  84. Selmar K Smit and AE Eiben. 2010. Parameter tuning of evolutionary algorithms: Generalist vs. specialist. In European conference on the applications of evolutionary computation. Springer, 542-551.Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. Shaden Smith, Jongsoo Park, and George Karypis. 2017. Sparse tensor factorization on many-core processors with high-bandwidth memory. In 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 1058-1067.Google ScholarGoogle ScholarCross RefCross Ref
  86. Nitish Srivastava, Geofrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15, 1 ( 2014 ), 1929-1958.Google ScholarGoogle Scholar
  87. Nitish Srivastava, Hanchen Jin, Shaden Smith, Hongbo Rong, David Albonesi, and Zhiru Zhang. 2020. Tensaurus: A versatile accelerator for mixed sparsedense tensor computations. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 689-702.Google ScholarGoogle ScholarCross RefCross Ref
  88. Praveen Ranjan Srivastava and Tai-hoon Kim. 2009. Application of genetic algorithm in software testing. International Journal of software Engineering and its Applications 3, 4 ( 2009 ), 87-96.Google ScholarGoogle Scholar
  89. Rainer Storn and Kenneth Price. 1997. Diferential evolution-a simple and eficient heuristic for global optimization over continuous spaces. Journal of global optimization 11, 4 ( 1997 ), 341-359.Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. Richard S Sutton, David A McAllester, Satinder P Singh, and Yishay Mansour. 2000. Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems. 1057-1063.Google ScholarGoogle Scholar
  91. Richard S Sutton, David A McAllester, Satinder P Singh, Yishay Mansour, et al. 1999. Policy gradient methods for reinforcement learning with function approximation.. In NIPs, Vol. 99. Citeseer, 1057-1063.Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1-9.Google ScholarGoogle ScholarCross RefCross Ref
  93. G Tomasi. 2005. Use of the properties of the Khatri-Rao product for the computation of Jacobian. Hessian, and gradient of the PARAFAC model under MATLAB ( 2005 ).Google ScholarGoogle Scholar
  94. Ethan Tseng, Felix Yu, Yuting Yang, Fahim Mannan, Karl ST Arnaud, Derek Nowrouzezahrai, Jean-François Lalonde, and Felix Heide. 2019. Hyperparameter optimization in black-box image processing using diferentiable proxies. ACM Transactions on Graphics ( 2019 ).Google ScholarGoogle Scholar
  95. Fengbin Tu, Shouyi Yin, Peng Ouyang, Shibin Tang, Leibo Liu, and Shaojun Wei. 2017. Deep convolutional neural network architecture with reconfigurable computation patterns. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 25, 8 ( 2017 ), 2220-2233.Google ScholarGoogle ScholarDigital LibraryDigital Library
  96. George Tucker, Andriy Mnih, Chris J Maddison, Dieterich Lawson, and Jascha Sohl-Dickstein. 2017. Rebar: Low-variance, unbiased gradient estimates for discrete latent variable models. arXiv preprint arXiv:1703.07370 ( 2017 ).Google ScholarGoogle Scholar
  97. Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S Moses, Sven Verdoolaege, Andrew Adams, and Albert Cohen. 2018. Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions. arXiv preprint arXiv: 1802. 04730 ( 2018 ).Google ScholarGoogle Scholar
  98. Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, and Song Han. 2019. Haq: Hardwareaware automated quantization with mixed precision. In Proceedings of the IEEE conference on computer vision and pattern recognition. 8612-8620.Google ScholarGoogle ScholarCross RefCross Ref
  99. Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, and Kurt Keutzer. 2019. Fbnet: Hardware-aware eficient convnet design via diferentiable neural architecture search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 10734-10742.Google ScholarGoogle ScholarCross RefCross Ref
  100. Amir Yazdanbakhsh, Kambiz Samadi, Nam Sung Kim, and Hadi Esmaeilzadeh. 2018. Ganax: A unified mimd-simd acceleration for generative adversarial networks. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, 650-661.Google ScholarGoogle ScholarDigital LibraryDigital Library
  101. Jiecao Yu, Andrew Lukefahr, David Palframan, Ganesh Dasika, Reetuparna Das, and Scott Mahlke. 2017. Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism. In Proceedings of the 44th Annual International Symposium on Computer Architecture. 548-560.Google ScholarGoogle ScholarDigital LibraryDigital Library
  102. Size Zheng, Yun Liang, Shuo Wang, Renze Chen, and Kaiwen Sheng. 2020. FlexTensor: An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System. In Proceedings of the TwentyFifth International Conference on Architectural Support for Programming Languages and Operating Systems. 859-873.Google ScholarGoogle ScholarDigital LibraryDigital Library
  103. Barret Zoph and Quoc V Le. 2016. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578 ( 2016 ).Google ScholarGoogle Scholar

Index Terms

  1. Mind mappings: enabling efficient algorithm-accelerator mapping space search

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ASPLOS '21: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems
        April 2021
        1090 pages
        ISBN:9781450383172
        DOI:10.1145/3445814

        Copyright © 2021 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 17 April 2021

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate535of2,713submissions,20%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader