ABSTRACT
Modern day computing increasingly relies on specialization to satiate growing performance and efficiency requirements. A core challenge in designing such specialized hardware architectures is how to perform mapping space search, i.e., search for an optimal mapping from algorithm to hardware. Prior work shows that choosing an inefficient mapping can lead to multiplicative-factor efficiency overheads. Additionally, the search space is not only large but also non-convex and non-smooth, precluding advanced search techniques. As a result, previous works are forced to implement mapping space search using expert choices or sub-optimal search heuristics.
This work proposes Mind Mappings, a novel gradient-based search method for algorithm-accelerator mapping space search. The key idea is to derive a smooth, differentiable approximation to the otherwise non-smooth, non-convex search space. With a smooth, differentiable approximation, we can leverage efficient gradient-based search algorithms to find high-quality mappings. We extensively compare Mind Mappings to black-box optimization schemes used in prior work. When tasked to find mappings for two important workloads (CNN and MTTKRP), Mind Mapping finds mappings that achieve an average 1.40×, 1.76×, and 1.29× (when run for a fixed number of steps) and 3.16×, 4.19×, and 2.90× (when run for a fixed amount of time) better energy-delay product (EDP) relative to Simulated Annealing, Genetic Algorithms and Reinforcement Learning, respectively. Meanwhile, Mind Mappings returns mappings with only 5.32× higher EDP than a possibly unachievable theoretical lower-bound, indicating proximity to the global optima.
- Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jefrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geofrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. https://www.tensorflow.org/ Software available from tensorflow. org.Google Scholar
- Andrew Adams, Karima Ma, Luke Anderson, Riyadh Baghdadi, Tzu-Mao Li, Michaël Gharbi, Benoit Steiner, Steven Johnson, Kayvon Fatahalian, Frédo Durand, et al. 2019. Learning to optimize halide with tree search and random programs. ACM Transactions on Graphics (TOG) 38, 4 ( 2019 ), 1-12.Google ScholarDigital Library
- Byung Hoon Ahn, Prannoy Pilligundla, and Hadi Esmaeilzadeh. 2019. Reinforcement Learning and Adaptive Sampling for Optimized DNN Compilation. arXiv preprint arXiv: 1905. 12799 ( 2019 ).Google Scholar
- Jorge Albericio, Patrick Judd, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, and Andreas Moshovos. 2016. Cnvlutin: inefectual-neuron-free deep neural network computing. In Proceedings of the 43rd International Symposium on Computer Architecture. 1-13.Google ScholarDigital Library
- Jason Ansel, Shoaib Kamil, Kalyan Veeramachaneni, Jonathan Ragan-Kelley, Jefrey Bosboom, Una-May O'Reilly, and Saman Amarasinghe. 2014. OpenTuner: An Extensible Framework for Program Autotuning. In International Conference on Parallel Architectures and Compilation Techniques. Edmonton, Canada. http://groups.csail.mit.edu/commit/papers/2014/ansel-pact14-opentuner.pdfGoogle ScholarDigital Library
- Charles Audet, J Denni, Douglas Moore, Andrew Booker, and Paul Frank. 2000. A surrogate-model-based method for constrained optimization. In 8th symposium on multidisciplinary analysis and optimization.Google ScholarCross Ref
- Riyadh Baghdadi, Jessica Ray, Malek Ben Romdhane, Emanuele Del Sozzo, Abdurrahman Akkas, Yunming Zhang, Patricia Suriana, Shoaib Kamil, and Saman Amarasinghe. 2019. Tiramisu: A polyhedral compiler for expressing fast and portable code. In Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization. IEEE Press, 193-205.Google ScholarCross Ref
- Richard Bellman. 1957. A Markovian decision process. Journal of mathematics and mechanics ( 1957 ), 679-684.Google Scholar
- Eliot Bolduc, George C Knee, Erik M Gauger, and Jonathan Leach. 2017. Projected gradient descent algorithms for quantum state tomography. npj Quantum Information 3, 1 ( 2017 ), 1-9.Google Scholar
- Justin A Boyan and Andrew W Moore. 1995. Generalization in reinforcement learning: Safely approximating the value function. In Advances in neural information processing systems. 369-376.Google Scholar
- J Douglas Carroll and Jih-Jie Chang. 1970. Analysis of individual diferences in multidimensional scaling via an N-way generalization of ?Eckart-Young? decomposition. Psychometrika 35, 3 ( 1970 ), 283-319.Google Scholar
- Prasanth Chatarasi, Hyoukjun Kwon, Natesh Raina, Saurabh Malik, Vaisakh Haridas, Tushar Krishna, and Vivek Sarkar. 2020. MARVEL: A Decoupled Model-driven Approach for Eficiently Mapping Convolutions on Spatial DNN Accelerators. arXiv preprint arXiv: 2002. 07752 ( 2020 ).Google Scholar
- Stephen Chen, James Montgomery, and Antonio Bolufé-Röhler. 2015. Measuring the curse of dimensionality and its efects on particle swarm optimization and diferential evolution. Applied Intelligence 42, 3 ( 2015 ), 514-526.Google Scholar
- Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. ACM, 785-794.Google ScholarDigital Library
- Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, et al. 2018. TVM: An automated end-to-end optimizing compiler for deep learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). 578-594.Google Scholar
- Tianshi Chen, Ke Tang, Guoliang Chen, and Xin Yao. 2012. A large population size can be unhelpful in evolutionary algorithms. Theoretical Computer Science 436 ( 2012 ), 54-70.Google Scholar
- Wenzheng Chen, Parsa Mirdehghan, Sanja Fidler, and Kiriakos N Kutulakos. 2020. Auto-Tuning Structured Light by Optical Stochastic Gradient Descent. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.Google ScholarCross Ref
- Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, et al. 2014. Dadiannao: A machinelearning supercomputer. In 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 609-622.Google ScholarDigital Library
- Yudong Chen and Martin J Wainwright. 2015. Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees. arXiv preprint arXiv:1509.03025 ( 2015 ).Google Scholar
- Yu-Hsin Chen, Joel Emer, and Vivienne Sze. 2016. Eyeriss: A Spatial Architecture for Energy-Eficient Dataflow for Convolutional Neural Networks. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). IEEE, 367-379.Google ScholarDigital Library
- Yu-Hsin Chen, Tien-Ju Yang, Joel Emer, and Vivienne Sze. 2019. Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 9, 2 ( 2019 ), 292-308.Google ScholarCross Ref
- Jason Chiang, Michael Studniberg, Jack Shaw, Stephen Seto, and Kevin Truong. 2006. Hardware accelerator for genomic sequence alignment. In 2006 International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE, 5787-5789.Google ScholarCross Ref
- Jason Chiang, Michael Studniberg, Jack Shaw, Stephen Seto, and Kevin Truong. 2006. Hardware accelerator for genomic sequence alignment. In 2006 International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE, 5787-5789.Google ScholarCross Ref
- Shail Dave, Youngbin Kim, Sasikanth Avancha, Kyoungwoo Lee, and Aviral Shrivastava. 2019. DMazerunner: Executing perfectly nested loops on dataflow accelerators. ACM Transactions on Embedded Computing Systems (TECS) 18, 5s ( 2019 ), 1-27.Google ScholarDigital Library
- Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248-255.Google ScholarCross Ref
- Zidong Du, Robert Fasthuber, Tianshi Chen, Paolo Ienne, Ling Li, Tao Luo, Xiaobing Feng, Yunji Chen, and Olivier Temam. 2015. ShiDianNao: Shifting vision processing closer to the sensor. In ACM SIGARCH Computer Architecture News, Vol. 43. ACM, 92-104.Google ScholarDigital Library
- Hans Eberle, Nils Gura, Daniel Finchelstein, Sheueling Chang-Shantz, and Vipul Gupta. 2009. Hardware accelerator for elliptic curve cryptography. US Patent 7, 508, 936.Google Scholar
- Félix-Antoine Fortin, François-Michel De Rainville, Marc-André Gardner, Marc Parizeau, and Christian Gagné. 2012. DEAP: Evolutionary Algorithms Made Easy. Journal of Machine Learning Research 13 (july 2012 ), 2171-2175.Google ScholarDigital Library
- David E Goldberg. 2006. Genetic algorithms. Pearson Education India.Google Scholar
- Daniel Golovin, Benjamin Solnik, Subhodeep Moitra, Greg Kochanski, John Karro, and D Sculley. 2017. Google vizier: A service for black-box optimization. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1487-1495.Google ScholarDigital Library
- Will Grathwohl, Dami Choi, Yuhuai Wu, Geofrey Roeder, and David Duvenaud. 2017. Backpropagation through the void: Optimizing control variates for blackbox gradient estimation. arXiv preprint arXiv:1711.00123 ( 2017 ).Google Scholar
- Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A Horowitz, and William J Dally. 2016. EIE: Eficient Inference Engine on Compressed Deep Neural Network. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). IEEE Computer Society, 243-254.Google ScholarDigital Library
- Song Han, Huizi Mao, and William J Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and hufman coding. arXiv preprint arXiv:1510.00149 ( 2015 ).Google Scholar
- Ahmad Hassanat, Khalid Almohammadi, Esra' Alkafaween, Eman Abunawas, Awni Hammouri, and VB Prasath. 2019. Choosing Mutation and Crossover Ratios for Genetic Algorithms-A Review with a New Dynamic Approach. Information 10, 12 ( 2019 ), 390.Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770-778.Google ScholarCross Ref
- Kartik Hegde, Rohit Agrawal, Yulun Yao, and Christopher W Fletcher. 2018. Morph: Flexible Acceleration for 3D CNN-based Video Understanding. In 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 933-946.Google ScholarDigital Library
- Kartik Hegde, Hadi Asghari-Moghaddam, Michael Pellauer, Neal Crago, Aamer Jaleel, Edgar Solomonik, Joel Emer, and Christopher W Fletcher. 2019. ExTensor: An Accelerator for Sparse Tensor Algebra. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. 319-333.Google ScholarDigital Library
- Kartik Hegde, Jiyong Yu, Rohit Agrawal, Mengjia Yan, Michael Pellauer, and Christopher Fletcher. 2018. Ucnn: Exploiting computational reuse in deep neural networks via weight repetition. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, 674-687.Google ScholarDigital Library
- John Henry Holland et al. 1992. Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. MIT press.Google Scholar
- Peter J Huber. 1992. Robust estimation of a location parameter. In Breakthroughs in statistics. Springer, 492-518.Google Scholar
- Tassadaq Hussain, Oscar Palomar, Adrian Cristal, Eduard Ayguadé, and Amna Haider. 2015. ViPS: Visual processing system for medical imaging. In 2015 8th International Conference on Biomedical Engineering and Informatics (BMEI). IEEE, 40-45.Google ScholarCross Ref
- Engin Ïpek, Sally A McKee, Rich Caruana, Bronis R de Supinski, and Martin Schulz. 2006. Eficiently exploring architectural design spaces via predictive modeling. ACM SIGOPS Operating Systems Review 40, 5 ( 2006 ), 195-206.Google ScholarDigital Library
- Norman P Jouppi, Clif Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th annual international symposium on computer architecture. 1-12.Google ScholarDigital Library
- Sheng-Chun Kao and Tushar Krishna. 2020. GAMMA: automating the HW mapping of DNN models on accelerators via genetic algorithm. In 2020 IEEE/ACM International Conference On Computer Aided Design (ICCAD). IEEE.Google ScholarDigital Library
- Scott Kirkpatrick, C Daniel Gelatt, and Mario P Vecchi. 1983. Optimization by simulated annealing. science 220, 4598 ( 1983 ), 671-680.Google Scholar
- Robert Kleinberg, Yuanzhi Li, and Yang Yuan. 2018. An alternative view: When does SGD escape local minima? arXiv preprint arXiv: 1802. 06175 ( 2018 ).Google Scholar
- Tamara G Kolda and Brett W Bader. 2009. Tensor decompositions and applications. SIAM review 51, 3 ( 2009 ), 455-500.Google Scholar
- Vijay R Konda and John N Tsitsiklis. 2000. Actor-critic algorithms. In Advances in neural information processing systems. 1008-1014.Google Scholar
- Slawomir Koziel and Leifur Leifsson. 2013. Surrogate-based modeling and optimization. Springer.Google Scholar
- Alex Krizhevsky, Ilya Sutskever, and Geofrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25 ( 2012 ), 1097-1105.Google Scholar
- Hyoukjun Kwon, Prasanth Chatarasi, Michael Pellauer, Angshuman Parashar, Vivek Sarkar, and Tushar Krishna. 2019. Understanding Reuse, Performance, and Hardware Cost of DNN Dataflow: A Data-Centric Approach. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 754-768.Google ScholarDigital Library
- Hyoukjun Kwon, Ananda Samajdar, and Tushar Krishna. 2018. MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Reconfigurable Interconnects. SIGPLAN Not. 53, 2 (March 2018 ), 461-475. https://doi.org/10. 1145/3296957.3173176 Google ScholarDigital Library
- Yann LeCun, D Touresky, G Hinton, and T Sejnowski. 1988. A theoretical framework for back-propagation. In Proceedings of the 1988 connectionist models summer school, Vol. 1. CMU, Pittsburgh, Pa: Morgan Kaufmann, 21-28.Google Scholar
- Yann A. LeCun, Léon Bottou, Genevieve B. Orr, and Klaus-Robert Müller. 2012. Eficient BackProp. Springer Berlin Heidelberg, Berlin, Heidelberg, 9-48. https: //doi.org/10.1007/978-3-642-35289-8_3 Google ScholarCross Ref
- Benjamin C Lee and David M Brooks. 2007. Illustrative design space studies with microarchitectural regression models. In 2007 IEEE 13th International Symposium on High Performance Computer Architecture. IEEE, 340-351.Google ScholarDigital Library
- Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2016. Continuous control with deep reinforcement learning. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings. http://arxiv.org/abs/1509.02971Google Scholar
- Hanxiao Liu, Karen Simonyan, and Yiming Yang. 2018. Darts: Diferentiable architecture search. arXiv preprint arXiv: 1806. 09055 ( 2018 ).Google Scholar
- Gilles Louppe, Joeri Hermans, and Kyle Cranmer. 2017. Adversarial variational optimization of non-diferentiable simulators. arXiv preprint arXiv:1707.07113 ( 2017 ).Google Scholar
- Wenyan Lu, Guihai Yan, Jiajun Li, Shijun Gong, Yinhe Han, and Xiaowei Li. 2017. Flexflow: A flexible dataflow accelerator architecture for convolutional neural networks. In 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 553-564.Google ScholarCross Ref
- Chris J Maddison, Andriy Mnih, and Yee Whye Teh. 2016. The concrete distribution: A continuous relaxation of discrete random variables. arXiv preprint arXiv:1611.00712 ( 2016 ).Google Scholar
- Sanu Mathew, Sudhir Satpathy, Vikram Suresh, Mark Anders, Himanshu Kaul, Amit Agarwal, Steven Hsu, Gregory Chen, and Ram Krishnamurthy. 2015. 340 mV-1.1 V, 289 Gbps/W, 2090-gate nanoAES hardware accelerator with areaoptimized encrypt/decrypt GF (2 4) 2 polynomials in 22 nm tri-gate CMOS. IEEE Journal of Solid-State Circuits 50, 4 ( 2015 ), 1048-1058.Google ScholarCross Ref
- Charith Mendis, Alex Renda, Saman Amarasinghe, and Michael Carbin. 2018. Ithemal: Accurate, portable and fast basic block throughput estimation using deep neural networks. arXiv preprint arXiv:1808. 07412 ( 2018 ).Google Scholar
- Tomas Mikolov, Kai Chen, Gregory S. Corrado, and Jefrey Dean. 2013. Eficient Estimation of Word Representations in Vector Space. CoRR abs/1301.3781 ( 2013 ).Google Scholar
- Anjum A Mohammed and Gihan Nagib. 2012. Optimal routing in ad-hoc network using genetic algorithm. Int. J. Advanced Networking and Applications 3, 05 ( 2012 ), 1323-1328.Google Scholar
- Yurii Nesterov. 2013. Introductory lectures on convex optimization: A basic course. Vol. 87. Springer Science & Business Media.Google ScholarDigital Library
- NVIDIA. [n.d.]. The NVIDIA Deep Learning Accelerator (NVDLA). http://nvdla.org/hw/v1/ias/programming_guide.html.Google Scholar
- Hari Mohan Pandey, Ankit Chaudhary, and Deepti Mehrotra. 2014. A comparative review of approaches to prevent premature convergence in GA. Applied Soft Computing 24 ( 2014 ), 1047-1077.Google Scholar
- Angshuman Parashar, Priyanka Raina, Yakun Sophia Shao, Yu-Hsin Chen, Victor A Ying, Anurag Mukkara, Rangharajan Venkatesan, Brucek Khailany, Stephen W Keckler, and Joel Emer. 2019. Timeloop: A Systematic Approach to DNN Accelerator Evaluation. In 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 304-315.Google ScholarCross Ref
- Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W Keckler, and William J Dally. 2017. SCNN: An accelerator for compressed-sparse convolutional neural networks. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA). IEEE, 27-40.Google ScholarDigital Library
- Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. 2012. Understanding the exploding gradient problem. CoRR, abs/1211.5063 2 ( 2012 ).Google Scholar
- Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems. 8024-8035.Google Scholar
- Tirthak Patel and Devesh Tiwari. 2020. CLITE: Eficient and QoS-Aware CoLocation of Multiple Latency-Critical Jobs for Warehouse Scale Computers. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 193-206.Google Scholar
- Michael Pellauer, Yakun Sophia Shao, Jason Clemons, Neal Crago, Kartik Hegde, Rangharajan Venkatesan, Stephen W Keckler, Christopher W Fletcher, and Joel Emer. 2019. Bufets: An eficient and composable storage idiom for explicit decoupled data orchestration. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. 137-151.Google ScholarDigital Library
- Matthew Perry. 2019. Python module for simulated annealing. https://github. com/perrygeo/simanneal.Google Scholar
- Nestor V Queipo, Raphael T Haftka, Wei Shyy, Tushar Goel, Rajkumar Vaidyanathan, and P Kevin Tucker. 2005. Surrogate-based analysis and optimization. Progress in aerospace sciences ( 2005 ).Google Scholar
- Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In Acm Sigplan Notices, Vol. 48. ACM, 519-530.Google ScholarDigital Library
- Brandon Reagen, José Miguel Hernández-Lobato, Robert Adolf, Michael Gelbart, Paul Whatmough, Gu-Yeon Wei, and David Brooks. 2017. A case for eficient accelerator design space exploration via Bayesian optimization. In 2017 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED). IEEE, 1-6.Google ScholarCross Ref
- Alex Renda, Yishen Chen, Charith Mendis, and Michael Carbin. 2020. Dif Tune: Optimizing CPU Simulator Parameters with Learned Diferentiable Surrogates. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE.Google Scholar
- Raanan Y Rohekar, Shami Nisimov, Yaniv Gurwicz, Guy Koren, and Gal Novik. 2018. Constructing deep neural networks by Bayesian network structure learning. In Advances in Neural Information Processing Systems. 3047-3058.Google Scholar
- Ananda Samajdar, Yuhao Zhu, Paul Whatmough, Matthew Mattina, and Tushar Krishna. 2018. Scale-sim: Systolic cnn accelerator. arXiv preprint arXiv: 1811. 02883 ( 2018 ).Google Scholar
- Sergey Shirobokov, Vladislav Belavin, Michael Kagan, Andrei Ustyuzhanin, and Atilim Gunes Baydin. 2020. Black-box optimization with local generative surrogates. In Workshop on Real World Experiment Design and Active Learning at International Conference on Machine Learning.Google Scholar
- Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR abs/1409.1556 ( 2014 ). http://arxiv.org/abs/1409.1556Google Scholar
- Age Smilde, Rasmus Bro, and Paul Geladi. 2005. Multi-way analysis: applications in the chemical sciences. John Wiley & Sons.Google Scholar
- Selmar K Smit and AE Eiben. 2010. Parameter tuning of evolutionary algorithms: Generalist vs. specialist. In European conference on the applications of evolutionary computation. Springer, 542-551.Google ScholarDigital Library
- Shaden Smith, Jongsoo Park, and George Karypis. 2017. Sparse tensor factorization on many-core processors with high-bandwidth memory. In 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 1058-1067.Google ScholarCross Ref
- Nitish Srivastava, Geofrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15, 1 ( 2014 ), 1929-1958.Google Scholar
- Nitish Srivastava, Hanchen Jin, Shaden Smith, Hongbo Rong, David Albonesi, and Zhiru Zhang. 2020. Tensaurus: A versatile accelerator for mixed sparsedense tensor computations. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 689-702.Google ScholarCross Ref
- Praveen Ranjan Srivastava and Tai-hoon Kim. 2009. Application of genetic algorithm in software testing. International Journal of software Engineering and its Applications 3, 4 ( 2009 ), 87-96.Google Scholar
- Rainer Storn and Kenneth Price. 1997. Diferential evolution-a simple and eficient heuristic for global optimization over continuous spaces. Journal of global optimization 11, 4 ( 1997 ), 341-359.Google ScholarDigital Library
- Richard S Sutton, David A McAllester, Satinder P Singh, and Yishay Mansour. 2000. Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems. 1057-1063.Google Scholar
- Richard S Sutton, David A McAllester, Satinder P Singh, Yishay Mansour, et al. 1999. Policy gradient methods for reinforcement learning with function approximation.. In NIPs, Vol. 99. Citeseer, 1057-1063.Google ScholarDigital Library
- Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1-9.Google ScholarCross Ref
- G Tomasi. 2005. Use of the properties of the Khatri-Rao product for the computation of Jacobian. Hessian, and gradient of the PARAFAC model under MATLAB ( 2005 ).Google Scholar
- Ethan Tseng, Felix Yu, Yuting Yang, Fahim Mannan, Karl ST Arnaud, Derek Nowrouzezahrai, Jean-François Lalonde, and Felix Heide. 2019. Hyperparameter optimization in black-box image processing using diferentiable proxies. ACM Transactions on Graphics ( 2019 ).Google Scholar
- Fengbin Tu, Shouyi Yin, Peng Ouyang, Shibin Tang, Leibo Liu, and Shaojun Wei. 2017. Deep convolutional neural network architecture with reconfigurable computation patterns. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 25, 8 ( 2017 ), 2220-2233.Google ScholarDigital Library
- George Tucker, Andriy Mnih, Chris J Maddison, Dieterich Lawson, and Jascha Sohl-Dickstein. 2017. Rebar: Low-variance, unbiased gradient estimates for discrete latent variable models. arXiv preprint arXiv:1703.07370 ( 2017 ).Google Scholar
- Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S Moses, Sven Verdoolaege, Andrew Adams, and Albert Cohen. 2018. Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions. arXiv preprint arXiv: 1802. 04730 ( 2018 ).Google Scholar
- Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, and Song Han. 2019. Haq: Hardwareaware automated quantization with mixed precision. In Proceedings of the IEEE conference on computer vision and pattern recognition. 8612-8620.Google ScholarCross Ref
- Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, and Kurt Keutzer. 2019. Fbnet: Hardware-aware eficient convnet design via diferentiable neural architecture search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 10734-10742.Google ScholarCross Ref
- Amir Yazdanbakhsh, Kambiz Samadi, Nam Sung Kim, and Hadi Esmaeilzadeh. 2018. Ganax: A unified mimd-simd acceleration for generative adversarial networks. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, 650-661.Google ScholarDigital Library
- Jiecao Yu, Andrew Lukefahr, David Palframan, Ganesh Dasika, Reetuparna Das, and Scott Mahlke. 2017. Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism. In Proceedings of the 44th Annual International Symposium on Computer Architecture. 548-560.Google ScholarDigital Library
- Size Zheng, Yun Liang, Shuo Wang, Renze Chen, and Kaiwen Sheng. 2020. FlexTensor: An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System. In Proceedings of the TwentyFifth International Conference on Architectural Support for Programming Languages and Operating Systems. 859-873.Google ScholarDigital Library
- Barret Zoph and Quoc V Le. 2016. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578 ( 2016 ).Google Scholar
Index Terms
- Mind mappings: enabling efficient algorithm-accelerator mapping space search
Recommendations
Use of mind mapping in search process to clarify information needs and improve search satisfaction
A mind map is an approach to the organisation of the human mind that prepares the ground for thinking. Inspired by the function of the mind in handling a situation, this article reports on an empirical study that evaluated the efficiency of mind map ...
A Fast Approximate Nearest Neighbor Search Algorithm in the Hamming Space
A fast approximate nearest neighbor search algorithm for the (binary) Hamming space is proposed. The proposed Error Weighted Hashing (EWH) algorithm is up to 20 times faster than the popular locality sensitive hashing (LSH) algorithm and works well even ...
I/O Efficient Algorithm for c-Approximate Furthest Neighbor Search in High-Dimensional Space
Database Systems for Advanced ApplicationsAbstractFurthest Neighbor search in high-dimensional space has been widely used in many applications such as recommendation systems. Because of the “curse of dimensionality” problem, c-approximate furthest neighbor (C-AFN) is a substitute as a trade-off ...
Comments