research-article

Warehouse-scale video acceleration: co-design and deployment in the wild

Authors:
Parthasarathy Ranganathan

Google, USA

Google, USA
View Profile

,
Daniel Stodolsky

Google, USA

Google, USA
View Profile

,
Jeff Calow

Google, USA

Google, USA
View Profile

,
Jeremy Dorfman

Google, USA

Google, USA
View Profile

,
Marisabel Guevara

Google, USA

Google, USA
View Profile

,
Clinton Wills Smullen IV

Google, USA

Google, USA
View Profile

,
Aki Kuusela

Google, USA

Google, USA
View Profile

,
Raghu Balasubramanian

Google, USA

Google, USA
View Profile

,
Sandeep Bhatia

Google, USA

Google, USA
View Profile

,
Prakash Chauhan

Google, USA

Google, USA
View Profile

,
Anna Cheung

Google, USA

Google, USA
View Profile

,
In Suk Chong

Google, USA

Google, USA
View Profile

,
Niranjani Dasharathi

Google, USA

Google, USA
View Profile

,
Jia Feng

Google, USA

Google, USA
View Profile

,
Brian Fosco

Google, USA

Google, USA
View Profile

,
Samuel Foss

Google, USA

Google, USA
View Profile

,
Ben Gelb

Google, USA

Google, USA
View Profile

,
Sara J. Gwin

Google, USA

Google, USA
View Profile

,
Yoshiaki Hase

Google, USA

Google, USA
View Profile

,
Da-ke He

Google, USA

Google, USA
View Profile

,
C. Richard Ho

Google, USA

Google, USA
View Profile

,
Roy W. Huffman Jr.

Google, USA

Google, USA
View Profile

,
Elisha Indupalli

Google, USA

Google, USA
View Profile

,
Indira Jayaram

Google, USA

Google, USA
View Profile

,
Poonacha Kongetira

Google, USA

Google, USA
View Profile

,
Cho Mon Kyaw

Google, USA

Google, USA
View Profile

,
Aaron Laursen

Google, USA

Google, USA
View Profile

,
Yuan Li

Google, USA

Google, USA
View Profile

,
Fong Lou

Google, USA

Google, USA
View Profile

,
Kyle A. Lucke

Google, USA

Google, USA
View Profile

,
JP Maaninen

Google, USA

Google, USA
View Profile

,
Ramon Macias

Google, USA

Google, USA
View Profile

,
Maire Mahony

Google, USA

Google, USA
View Profile

,
David Alexander Munday

Google, USA

Google, USA
View Profile

,
Srikanth Muroor

Google, USA

Google, USA
View Profile

,
Narayana Penukonda

Google, USA

Google, USA
View Profile

,
Eric Perkins-Argueta

Google, USA

Google, USA
View Profile

,
Devin Persaud

Google, USA

Google, USA
View Profile

,
Alex Ramirez

Google, USA

Google, USA
View Profile

,
Ville-Mikko Rautio

Google, USA

Google, USA
View Profile

,
Yolanda Ripley

Google, USA

Google, USA
View Profile

,
Amir Salek

Google, USA

Google, USA
View Profile

,
Sathish Sekar

Google, USA

Google, USA
View Profile

,
Sergey N. Sokolov

Google, USA

Google, USA
View Profile

,
Rob Springer

Google, USA

Google, USA
View Profile

,
Don Stark

Google, USA

Google, USA
View Profile

,
Mercedes Tan

Google, USA

Google, USA
View Profile

,
Mark S. Wachsler

Google, USA

Google, USA
View Profile

,
Andrew C. Walton

Google, USA

Google, USA
View Profile

,
David A. Wickeraad

Google, USA

Google, USA
View Profile

,
Alvin Wijaya

Google, USA

Google, USA
View Profile

,
Hon Kwan Wu

Google, USA

Google, USA

0000-0003-3138-5575
View Profile

ASPLOS '21: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating SystemsApril 2021Pages 600–615https://doi.org/10.1145/3445814.3446723

Published:17 April 2021Publication History

ASPLOS '21: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

Pages 600–615

ABSTRACT

Video sharing (e.g., YouTube, Vimeo, Facebook, TikTok) accounts for the majority of internet traffic, and video processing is also foundational to several other key workloads (video conferencing, virtual/augmented reality, cloud gaming, video in Internet-of-Things devices, etc.). The importance of these workloads motivates larger video processing infrastructures and – with the slowing of Moore’s law – specialized hardware accelerators to deliver more computing at higher efficiencies. This paper describes the design and deployment, at scale, of a new accelerator targeted at warehouse-scale video transcoding. We present our hardware design including a new accelerator building block – the video coding unit (VCU) – and discuss key design trade-offs for balanced systems at data center scale and co-designing accelerators with large-scale distributed software systems. We evaluate these accelerators “in the wild" serving live data center jobs, demonstrating 20-33x improved efficiency over our prior well-tuned non-accelerated baseline. Our design also enables effective adaptation to changing bottlenecks and improved failure management, and new workload capabilities not otherwise possible with prior systems. To the best of our knowledge, this is the first work to discuss video acceleration at scale in large warehouse-scale environments.

References

Ambarella 2015. Ambarella H2 Product Brief. Ambarella. Retrieved February 13, 2021 from https://www.ambarella.com/wp-content/uploads/H2-Product-Brief. pdfGoogle Scholar
Ihab Amer, Wael Badawy, and Graham Jullien. 2005. A design flow for an H.264 embedded video encoder. In 2005 International Conference on Information and Communication Technology. IEEE, 505-513. https://doi.org/10.1109/ITICT. 2005. 1609647 Google ScholarCross Ref
Paul H. Bardell, William H. McAnney, and Jacob Savir. 1987. Built-in Test for VLSI: Pseudorandom Techniques. Wiley-Interscience, USA.Google Scholar
Luiz André Barroso, Urs Hölzle, and Parthasarathy Ranganathan. 2018. The Datacenter as a Computer (3 ed.). Morgan & Claypool Publishers. https://doi. org/10.2200/S00874ED3V01Y201809CAC046 Google ScholarCross Ref
Gisle Bjøntegaard. 2001. Calculation of Average PSNR Diferences between RDcurves. In ITU-T SG 16/Q6 (VCEG-M33). ITU, 13th VCEG Meeting, Austin, TX, USA, 1-4.Google Scholar
Cheng Chen, Jingning Han, and Yaowu Xu. 2020. A Non-local Mean Temporal Filter for Video Compression. In 2020 IEEE International Conference on Image Processing (ICIP). IEEE, 1142-1146. https://doi.org/10.1109/ICIP40778. 2020.9191313 Google ScholarCross Ref
Chao Chen, Yao-Chung Lin, Anil Kokaram, and Steve Benting. 2017. Encoding Bitrate Optimization Using Playback Statistics for HTTP-based Adaptive Video Streaming. arXiv: 1709.08763 https://arxiv.org/abs/1709.08763Google Scholar
Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '14). Association for Computing Machinery, New York, NY, USA, 269-284. https://doi.org/10.1145/2541940.2541967 Google ScholarDigital Library
Yanjiao Chen, Kaishun Wu, and Qian Zhang. 2015. From QoS to QoE: A Tutorial on Video Quality Assessment. IEEE Communications Surveys & Tutorials 17, 2 ( 2015 ), 1126-1165. https://doi.org/10.1109/COMST. 2014.2363139 Google ScholarCross Ref
Cam Cullen. 2019. Sandvine Internet Phenomena Report Q3 2019. Sandvine. Retrieved August 19, 2020 from https://www.sandvine.com/hubfs/Sandvine_Redesign_2019/Downloads/Internet%20Phenomena/Internet%20Phenomena% 20Report % 20Q32019 % 2020190910.pdfGoogle Scholar
Cam Cullen. 2020. Sandvine Global Internet Phenomena COVID-19 Spotlight. Sandvine. Retrieved August 20, 2020 from https://www.sandvine.com/blog/globalinternet-phenomena-covid-19-spotlight-youtube-is-the-1-global-applicationGoogle Scholar
Peter de Rivaz and Jack Haughton. 2019. AV1 Bitstream & Decoding Process Specification. The Alliance for Open Media. Retrieved February 13, 2021 from https://aomediacodec.github.io/av1-spec/av1-spec.pdfGoogle Scholar
Christina Delimitrou and Christos Kozyrakis. 2013. Paragon: QoS-Aware Scheduling for Heterogeneous Datacenters. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '13). Association for Computing Machinery, New York, NY, USA, 77-88. https://doi.org/10.1145/2451116.2451125 Google ScholarDigital Library
FFmpeg developers. 2021. FFmpeg: A complete, cross-platform solution to record, convert and stream audio and video. FFmpeg.org. https://fmpeg.org/Google Scholar
John Dilley, Bruce Maggs, Jay Parikh, Harald Prokop, Ramesh Sitaraman, and Bill Weihl. 2002. Globally distributed content delivery. IEEE Internet Computing 6, 5 ( 2002 ), 50-58. https://doi.org/10.1109/MIC. 2002.1036038 Google ScholarCross Ref
Sadjad Fouladi, John Emmons, Emre Orbay, Catherine Wu, Riad S. Wahby, and Keith Winstein. 2018. Salsify: Low-Latency Network Video through Tighter Integration between a Video Codec and a Transport Protocol. In Proceedings of the 15th USENIX Conference on Networked Systems Design and Implementation (NSDI'18). USENIX Association, USA, 267-282.Google Scholar
Sadjad Fouladi, Riad S. Wahby, Brennan Shacklett, Karthikeyan Vasuki Balasubramaniam, William Zeng, Rahul Bhalerao, Anirudh Sivaraman, George Porter, and Keith Winstein. 2017. Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). USENIX Association, Boston, MA, 363-376. https://www.usenix.org/conference/nsdi17/technicalsessions/presentation/fouladiGoogle Scholar
Mingyu Gao, Jing Pu, Xuan Yang, Mark Horowitz, and Christos Kozyrakis. 2017. TETRIS: Scalable and Eficient Neural Network Acceleration with 3D Memory. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '17). Association for Computing Machinery, New York, NY, USA, 751-764. https://doi.org/10.1145/ 3037697.3037702 Google ScholarDigital Library
M.R Garey, R.L Graham, D.S Johnson, and Andrew Chi-Chih Yao. 1976. Resource constrained scheduling as generalized bin packing. Journal of Combinatorial Theory, Series A 21, 3 ( 1976 ), 257-298. https://doi.org/10.1016/ 0097-3165 ( 76 ) 90001-7 Google ScholarCross Ref
Google, Inc. 2017. Recommended upload encoding settings. Google, Inc. Retrieved Feburary 13, 2021 from https://support.google.com/youtube/answer/1722171Google Scholar
Adrian Grange, Peter de Rivaz, and Jack Haughton. 2016. Draft VP9 Bitstream and Decoding Process Specification. Google. Retrieved February 13, 2021 from https://www.webmproject.org/vp9/Google Scholar
Dan Grois, Detlev Marpe, Amit Mulayof, Benaya Itzhaky, and Ofer Hadar. 2013. Performance comparison of H.265/MPEG-HEVC, VP9, and H. 264/MPEG-AVC encoders. In 2013 Picture Coding Symposium (PCS). IEEE, 394-397. https://doi. org/10.1109/PCS. 2013.6737766 Google ScholarCross Ref
Kaiyuan Guo, Song Han, Song Yao, Yu Wang, Yuan Xie, and Huazhong Yang. 2017. Software-Hardware Codesign for Eficient Neural Network Acceleration. IEEE Micro 37, 2 ( 2017 ), 18-25. https://doi.org/10.1109/MM. 2017.39 Google ScholarCross Ref
Liwei Guo, Jan De Cock, and Anne Aaron. 2018. Compression Performance Comparison of x264, x265, libvpx and aomenc for On-Demand Adaptive Streaming Applications. In 2018 Picture Coding Symposium (PCS). IEEE, 26-30. https: //doi.org/10.1109/PCS. 2018.8456302 Google ScholarCross Ref
Lei Guo, Enhua Tan, Songqing Chen, Zhen Xiao, and Xiaodong Zhang. 2008. The Stretched Exponential Distribution of Internet Media Access Patterns. In Proceedings of the Twenty-Seventh ACM Symposium on Principles of Distributed Computing (PODC '08). Association for Computing Machinery, New York, NY, USA, 283-294. https://doi.org/10.1145/1400751.1400789 Google ScholarDigital Library
R. W. Hamming. 1950. Error detecting and error correcting codes. The Bell System Technical Journal 29, 2 ( 1950 ), 147-160. https://doi.org/10.1002/j.1538-7305. 1950.tb00463.x Google ScholarCross Ref
John Hennessy and David Patterson. 2018. A new golden age for computer architecture: Domain-specific hardware/software co-design, enhanced security, open instruction sets, and agile chip development. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, 27-29. https: //doi.org/10.1109/ISCA. 2018.00011 Google ScholarCross Ref
International Telecommunication Union 2019. H. 264 : Advanced Video Coding for generic audiovisual services. International Telecommunication Union. Retrieved February 13, 2021 from https://www.itu.int/rec/T-REC-H. 264-201906-I/enGoogle Scholar
Jae-Won Suh and Yo-Sung Ho. 2002. Error concealment techniques for digital TV. IEEE Transactions on Broadcasting 48, 4 ( 2002 ), 299-306. https://doi.org/10. 1109/TBC. 2002.806797 Google ScholarCross Ref
Norman P. Jouppi, Clif Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Cliford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jefrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jafey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA '17). Association for Computing Machinery, New York, NY, USA, 1-12. https://doi.org/10.1145/3079856.3080246 Google ScholarDigital Library
Svilen Kanev, Juan Pablo Darago, Kim Hazelwood, Parthasarathy Ranganathan, Tipp Moseley, Gu-Yeon Wei, and David Brooks. 2015. Profiling a WarehouseScale Computer. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA '15). Association for Computing Machinery, New York, NY, USA, 158-169. https://doi.org/10.1145/2749469.2750392 Google ScholarDigital Library
David Karger, Eric Lehman, Tom Leighton, Rina Panigrahy, Matthew Levine, and Daniel Lewin. 1997. Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web. In Proceedings of the twenty-ninth annual ACM symposium on Theory of computing. Association for Computing Machinery, 654-663. https://doi.org/10.1145/258533.258660 Google ScholarDigital Library
Ioannis Katsavounidis. 2018. Dynamic optimizer-a perceptual video encoding optimization framework. Netflix. Retrieved August 19, 2020 from https://netflixtechblog.com /dynamic-optimizer-a-perceptual-videoencoding-optimization-framework-e19f1e3a277fGoogle Scholar
Anil Kokaram, Thierry Foucu, and Yang Hu. 2016. A look into YouTube's video ifle anatomy. Google, Inc. https://www.googblogs. com/a-look-into-youtubesvideo-file-anatomy/Google Scholar
Ramana Rao Kompella, Jennifer Yates, Albert Greenberg, and Alex C Snoeren. 2007. Detection and localization of network black holes. In IEEE INFOCOM 2007-26th IEEE International Conference on Computer Communications. IEEE, 2180-2188. https://doi.org/10.1109/INFCOM. 2007.252 Google ScholarCross Ref
Jan Kufa and Tomas Kratochvil. 2017. Software and hardware HEVC encoding. In 2017 International Conference on Systems, Signals and Image Processing (IWSSIP). IEEE, 1-5. https://doi.org/10.1109/IWSSIP. 2017.7965585 Google ScholarCross Ref
Kevin Lee and Vijay Rao. 2019. Accelerating Facebook's infrastructure with application-specific hardware. Facebook. Retrieved August 20, 2020 from https: //engineering.fb.com/data-center-engineering/accelerating-infrastructure/Google Scholar
Daofu Liu, Tianshi Chen, Shaoli Liu, Jinhong Zhou, Shengyuan Zhou, Olivier Teman, Xiaobing Feng, Xuehai Zhou, and Yunji Chen. 2015. PuDianNao: A Polyvalent Machine Learning Accelerator. SIGPLAN Not. 50, 4 (March 2015 ), 369-381. https://doi.org/10.1145/2775054.2694358 Google ScholarDigital Library
Andrea Lottarini, Alex Ramirez, Joel Coburn, Martha A. Kim, Parthasarathy Ranganathan, Daniel Stodolsky, and Mark Wachsler. 2018. vbench: Benchmarking Video Transcoding in the Cloud. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '18). Association for Computing Machinery, New York, NY, USA, 797-809. https://doi.org/10.1145/3173162.3173207 Google ScholarDigital Library
Ikuo Magaki, Moein Khazraee, Luis Vega Gutierrez, and Michael Bedford Taylor. 2016. ASIC Clouds: Specializing the Datacenter. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA '16). IEEE Press, 178-190. https://doi.org/10.1109/ISCA. 2016.25 Google ScholarCross Ref
Jason Mars and Lingjia Tang. 2013. Whare-Map: Heterogeneity in "Homogeneous" Warehouse-Scale Computers. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA '13). Association for Computing Machinery, New York, NY, USA, 619-630. https://doi.org/10.1145/2485922. 2485975 Google ScholarDigital Library
Debargha Mukherjee, Jim Bankoski, Adrian Grange, Jingning Han, John Koleszar, Paul Wilkins, Yaowu Xu, and Ronald Bultje. 2013. The latest open-source video codec VP9-An overview and preliminary results. In 2013 Picture Coding Symposium (PCS). IEEE, 390-393. https://doi.org/10.1109/PCS. 2013.6737765 Google ScholarCross Ref
Ngoc-Mai Nguyen, Edith Beigne, Suzanne Lesecq, Duy-Hieu Bui, Nam-Khanh Dang, and Xuan-Tu Tran. 2014. H.264/ AVC hardware encoders and low-power features. In 2014 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS). IEEE, 77-80. https://doi.org/10.1109/APCCAS. 2014.7032723 Google ScholarCross Ref
Antonio Ortega and Kannan Ramchandran. 1998. Rate-distortion methods for image and video compression. IEEE Signal Processing Magazine 15, 6 ( 1998 ), 23-50. https://doi.org/10.1109/79.733495 Google ScholarCross Ref
Grzegorz Pastuszak. 2016. High-speed architecture of the CABAC probability modeling for H.265/HEVC encoders. In 2016 International Conference on Signals and Electronic Systems (ICSES). IEEE, 143-146. https://doi.org/10.1109/ICSES. 2016.7593839 Google ScholarCross Ref
Francisco Romero and Christina Delimitrou. 2018. Mage: Online and InterferenceAware Scheduling for Multi-Scale Heterogeneous Systems. In Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques (PACT18). Association for Computing Machinery, Article 19, 13 pages. https: //doi.org/10.1145/3243176.3243183 Google ScholarDigital Library
Samsung 2018. Exynos 8895 Processor: Specs, Features. Samsung. Retrieved February 13, 2021 from https://www.samsung.com/semiconductor/minisite/exynos/ products/mobileprocessor/exynos-9-series-8895/Google Scholar
Y. Sani, A. Mauthe, and C. Edwards. 2017. Adaptive Bitrate Selection: A Survey. IEEE Communications Surveys Tutorials 19, 4 ( 2017 ), 2985-3014. https://doi.org/ 10.1109/COMST. 2017.2725241 Google ScholarCross Ref
H. Schwarz, T. Nguyen, D. Marpe, and T. Wiegand. 2019. Hybrid Video Coding with Trellis-Coded Quantization. In 2019 Data Compression Conference (DCC). IEEE, 182-191. https://doi.org/10.1109/DCC. 2019.00026 Google ScholarCross Ref
Konstantin Serebryany, Derek Bruening, Alexander Potapenko, and Dmitry Vyukov. 2012. AddressSanitizer: A Fast Address Sanity Checker. In Proceedings of the 2012 USENIX Conference on Annual Technical Conference (USENIX ATC'12). USENIX Association, USA, 28.Google Scholar
Daniel Shelepov, Juan Carlos Saez Alcaide, Stacey Jefery, Alexandra Fedorova, Nestor Perez, Zhi Feng Huang, Sergey Blagodurov, and Viren Kumar. 2009. HASS: A Scheduler for Heterogeneous Multicore Systems. SIGOPS Oper. Syst. Rev. 43, 2 (April 2009 ), 66-75. https://doi.org/10.1145/1531793.1531804 Google ScholarDigital Library
Siemens Digital Industries Software 2021. Catapult High-Level Synthesis. Siemens Digital Industries Software. Retrieved Feburary 13, 2021 from https://www. mentor.com/hls-lp/ catapult-high-level-synthesisGoogle Scholar
Akshitha Sriraman and Abhishek Dhanotia. 2020. Accelerometer: Understanding Acceleration Opportunities for Data Center Overheads at Hyperscale. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '20). Association for Computing Machinery, New York, NY, USA, 733-750. https://doi.org/10.1145/ 3373376.3378450 Google ScholarDigital Library
Evgeniy Stepanov and Konstantin Serebryany. 2015. MemorySanitizer: Fast Detector of Uninitialized Memory Use in C++. In Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO '15). IEEE Computer Society, USA, 46-55. https://doi.org/10.1109/CGO. 2015. 7054186 Google ScholarCross Ref
Gary J. Sullivan and Thomas Wiegand. 2005. Video Compression-From Concepts to the H.264/AVC Standard. Proc. IEEE 93, 1 ( 2005 ), 18-31. https://doi.org/10. 1109/JPROC. 2004.839617 Google ScholarCross Ref
A. Takach. 2016. High-Level Synthesis: Status, Trends, and Future Directions. IEEE Design & Test 33, 3 ( 2016 ), 116-124. https://doi.org/10.1109/MDAT. 2016.2544850 Google ScholarCross Ref
Tung-Chien Chen, Chung-Jr Lian, and Liang-Gee Chen. 2006. Hardware architecture design of an H.264/AVC video codec. In Asia and South Pacific Conference on Design Automation, 2006. IEEE, 8 pp.-. https://doi.org/10.1109/ASPDAC. 2006. 1594776 Google ScholarCross Ref
K. Van Craeynest, A. Jaleel, L. Eeckhout, P. Narvaez, and J. Emer. 2012. Scheduling heterogeneous multi-cores through performance impact estimation (PIE). In 2012 39th Annual International Symposium on Computer Architecture (ISCA). IEEE, 213-224. https://doi.org/10.1109/ISCA. 2012.6237019 Google ScholarCross Ref
Abhishek Verma, Luis Pedrosa, Madhukar R. Korupolu, David Oppenheimer, Eric Tune, and John Wilkes. 2015. Large-scale cluster management at Google with Borg. In Proceedings of the European Conference on Computer Systems (EuroSys). Association for Computing Machinery, Bordeaux, France, Article 18, 17 pages. https://doi.org/10.1145/2741948.2741964 Google ScholarDigital Library
K. Wei, S. Zhang, H. Jia, D. Xie, and W. Gao. 2012. A flexible and high-performance hardware video encoder architecture. In 2012 Picture Coding Symposium. IEEE, 373-376. https://doi.org/10.1109/PCS. 2012.6213368 Google ScholarCross Ref
P. H. Westerink, R. Rajagopalan, and C. A. Gonzales. 1999. Two-pass MPEG-2 variable-bit-rate encoding. IBM Journal of Research and Development 43, 4 ( 1999 ), 471-488. https://doi.org/10.1147/rd.434.0471 Google ScholarDigital Library
M. A. Wilhelmsen, H. K. Stensland, V. R. Gaddam, A. Mortensen, R. Langseth, C. Griwodz, and P. Halvorsen. 2014. Using a Commodity Hardware Video Encoder for Interactive Video Streaming. In 2014 IEEE International Symposium on Multimedia. IEEE, 251-254. https://doi.org/10.1109/ISM. 2014.58 Google ScholarCross Ref
Yaowu Xu. 2010. Inside WebM Technology: The VP8 Alternate Reference Frame. Google, Inc. Retrieved Feburary 13, 2021 from http://blog.webmproject.org/ 2010 / 05/inside-webm-technology-vp8-alternate.htmlGoogle Scholar
Xuan Yang, Mingyu Gao, Qiaoyi Liu, Jef Setter, Jing Pu, Ankita Nayak, Steven Bell, Kaidi Cao, Heonjae Ha, Priyanka Raina, Christos Kozyrakis, and Mark Horowitz. 2020. Interstellar: Using Halide's Scheduling Language to Analyze DNN Accelerators. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '20). Association for Computing Machinery, New York, NY, USA, 369-383. https://doi.org/10.1145/3373376.3378514 Google ScholarDigital Library
Yu-Wen Huang, Bing-Yu Hsieh, Tung-Chien Chen, and Liang-Gee Chen. 2005. Analysis, fast algorithm, and VLSI architecture design for H.264/AVC intra frame coder. IEEE Transactions on Circuits and Systems for Video Technology 15, 3 ( 2005 ), 378-401. https://doi.org/10.1109/TCSVT. 2004.842620 Google ScholarCross Ref
Whitney Zhao, Tifany Jin, Cheng Chen, Siamak Taveallaei, and Zhenghui Wu. 2019. OCP Accelerator Module Design Specification. Open Compute Project. Retrieved February 13, 2021 from https://www.opencompute.org/documents/ocpaccelerator-module-design-specification-v1p0-3-pdfGoogle Scholar

Index Terms

Warehouse-scale video acceleration: co-design and deployment in the wild
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Special purpose systems
2. Hardware
  1. Electronic design automation
    1. High-level and register-transfer level synthesis
      1. Hardware-software codesign

Recommendations

Design of a Classification System for Rectangular Shapes Using a Co-Design Environment
SBCCI '00: Proceedings of the 13th symposium on Integrated circuits and systems design

Pattern localization and classification are CPU time intensive, being normally implemented in software. Custom implementations in hardware allow real-time processing. In practice, in ASIC or FPGA implementations, the digitization process introduces ...
Read More
An undergraduate system-on-chip (SoC) course for computer engineering students

The authors have developed a senior-level undergraduate system-on-chip (SoC) course at San Jose State University, San Jose, CA, that emphasizes SoC design methods and hardware-software codesign techniques. The course uses a "real world" design project ...
Read More
Platform-based design for an embedded-fingerprint-authentication device

Fingerprint authentication, in an embedded and portable context, requires complex signal, network, and security-protocol processing in a resource-constrained implementation. We present a platform-based design approach for this application, based on a ...
Read More

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ASPLOS '21: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems
April 2021
1090 pages
ISBN:9781450383172
DOI:10.1145/3445814
General Chair:
Tim Sherwood
University of California at Santa Barbara, USA
,
Program Chairs:
Emery Berger
University of Massachusetts at Amherst, USA
,
Christos Kozyrakis
Stanford University, USA
Copyright © 2021 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 April 2021
Check for updates
Author Tags
domain-specific accelerators
hardware-software codesign
video transcoding
warehouse-scale computing
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate535of2,713submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 19
  Total Citations
  View Citations
- 8,717
  Total Downloads
- Downloads (Last 12 months)463
- Downloads (Last 6 weeks)65
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Warehouse-scale video acceleration: co-design and deployment in the wild

ASPLOS '21: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Design of a Classification System for Rectangular Shapes Using a Co-Design Environment

An undergraduate system-on-chip (SoC) course for computer engineering students

Platform-based design for an embedded-fingerprint-authentication device