[go: up one dir, main page]

skip to main content
10.1145/3445814.3446723acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

Warehouse-scale video acceleration: co-design and deployment in the wild

Authors Info & Claims
Published:17 April 2021Publication History

ABSTRACT

Video sharing (e.g., YouTube, Vimeo, Facebook, TikTok) accounts for the majority of internet traffic, and video processing is also foundational to several other key workloads (video conferencing, virtual/augmented reality, cloud gaming, video in Internet-of-Things devices, etc.). The importance of these workloads motivates larger video processing infrastructures and – with the slowing of Moore’s law – specialized hardware accelerators to deliver more computing at higher efficiencies. This paper describes the design and deployment, at scale, of a new accelerator targeted at warehouse-scale video transcoding. We present our hardware design including a new accelerator building block – the video coding unit (VCU) – and discuss key design trade-offs for balanced systems at data center scale and co-designing accelerators with large-scale distributed software systems. We evaluate these accelerators “in the wild" serving live data center jobs, demonstrating 20-33x improved efficiency over our prior well-tuned non-accelerated baseline. Our design also enables effective adaptation to changing bottlenecks and improved failure management, and new workload capabilities not otherwise possible with prior systems. To the best of our knowledge, this is the first work to discuss video acceleration at scale in large warehouse-scale environments.

References

  1. Ambarella 2015. Ambarella H2 Product Brief. Ambarella. Retrieved February 13, 2021 from https://www.ambarella.com/wp-content/uploads/H2-Product-Brief. pdfGoogle ScholarGoogle Scholar
  2. Ihab Amer, Wael Badawy, and Graham Jullien. 2005. A design flow for an H.264 embedded video encoder. In 2005 International Conference on Information and Communication Technology. IEEE, 505-513. https://doi.org/10.1109/ITICT. 2005. 1609647 Google ScholarGoogle ScholarCross RefCross Ref
  3. Paul H. Bardell, William H. McAnney, and Jacob Savir. 1987. Built-in Test for VLSI: Pseudorandom Techniques. Wiley-Interscience, USA.Google ScholarGoogle Scholar
  4. Luiz André Barroso, Urs Hölzle, and Parthasarathy Ranganathan. 2018. The Datacenter as a Computer (3 ed.). Morgan & Claypool Publishers. https://doi. org/10.2200/S00874ED3V01Y201809CAC046 Google ScholarGoogle ScholarCross RefCross Ref
  5. Gisle Bjøntegaard. 2001. Calculation of Average PSNR Diferences between RDcurves. In ITU-T SG 16/Q6 (VCEG-M33). ITU, 13th VCEG Meeting, Austin, TX, USA, 1-4.Google ScholarGoogle Scholar
  6. Cheng Chen, Jingning Han, and Yaowu Xu. 2020. A Non-local Mean Temporal Filter for Video Compression. In 2020 IEEE International Conference on Image Processing (ICIP). IEEE, 1142-1146. https://doi.org/10.1109/ICIP40778. 2020.9191313 Google ScholarGoogle ScholarCross RefCross Ref
  7. Chao Chen, Yao-Chung Lin, Anil Kokaram, and Steve Benting. 2017. Encoding Bitrate Optimization Using Playback Statistics for HTTP-based Adaptive Video Streaming. arXiv: 1709.08763 https://arxiv.org/abs/1709.08763Google ScholarGoogle Scholar
  8. Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '14). Association for Computing Machinery, New York, NY, USA, 269-284. https://doi.org/10.1145/2541940.2541967 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Yanjiao Chen, Kaishun Wu, and Qian Zhang. 2015. From QoS to QoE: A Tutorial on Video Quality Assessment. IEEE Communications Surveys & Tutorials 17, 2 ( 2015 ), 1126-1165. https://doi.org/10.1109/COMST. 2014.2363139 Google ScholarGoogle ScholarCross RefCross Ref
  10. Cam Cullen. 2019. Sandvine Internet Phenomena Report Q3 2019. Sandvine. Retrieved August 19, 2020 from https://www.sandvine.com/hubfs/Sandvine_Redesign_2019/Downloads/Internet%20Phenomena/Internet%20Phenomena% 20Report % 20Q32019 % 2020190910.pdfGoogle ScholarGoogle Scholar
  11. Cam Cullen. 2020. Sandvine Global Internet Phenomena COVID-19 Spotlight. Sandvine. Retrieved August 20, 2020 from https://www.sandvine.com/blog/globalinternet-phenomena-covid-19-spotlight-youtube-is-the-1-global-applicationGoogle ScholarGoogle Scholar
  12. Peter de Rivaz and Jack Haughton. 2019. AV1 Bitstream & Decoding Process Specification. The Alliance for Open Media. Retrieved February 13, 2021 from https://aomediacodec.github.io/av1-spec/av1-spec.pdfGoogle ScholarGoogle Scholar
  13. Christina Delimitrou and Christos Kozyrakis. 2013. Paragon: QoS-Aware Scheduling for Heterogeneous Datacenters. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '13). Association for Computing Machinery, New York, NY, USA, 77-88. https://doi.org/10.1145/2451116.2451125 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. FFmpeg developers. 2021. FFmpeg: A complete, cross-platform solution to record, convert and stream audio and video. FFmpeg.org. https://fmpeg.org/Google ScholarGoogle Scholar
  15. John Dilley, Bruce Maggs, Jay Parikh, Harald Prokop, Ramesh Sitaraman, and Bill Weihl. 2002. Globally distributed content delivery. IEEE Internet Computing 6, 5 ( 2002 ), 50-58. https://doi.org/10.1109/MIC. 2002.1036038 Google ScholarGoogle ScholarCross RefCross Ref
  16. Sadjad Fouladi, John Emmons, Emre Orbay, Catherine Wu, Riad S. Wahby, and Keith Winstein. 2018. Salsify: Low-Latency Network Video through Tighter Integration between a Video Codec and a Transport Protocol. In Proceedings of the 15th USENIX Conference on Networked Systems Design and Implementation (NSDI'18). USENIX Association, USA, 267-282.Google ScholarGoogle Scholar
  17. Sadjad Fouladi, Riad S. Wahby, Brennan Shacklett, Karthikeyan Vasuki Balasubramaniam, William Zeng, Rahul Bhalerao, Anirudh Sivaraman, George Porter, and Keith Winstein. 2017. Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). USENIX Association, Boston, MA, 363-376. https://www.usenix.org/conference/nsdi17/technicalsessions/presentation/fouladiGoogle ScholarGoogle Scholar
  18. Mingyu Gao, Jing Pu, Xuan Yang, Mark Horowitz, and Christos Kozyrakis. 2017. TETRIS: Scalable and Eficient Neural Network Acceleration with 3D Memory. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '17). Association for Computing Machinery, New York, NY, USA, 751-764. https://doi.org/10.1145/ 3037697.3037702 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M.R Garey, R.L Graham, D.S Johnson, and Andrew Chi-Chih Yao. 1976. Resource constrained scheduling as generalized bin packing. Journal of Combinatorial Theory, Series A 21, 3 ( 1976 ), 257-298. https://doi.org/10.1016/ 0097-3165 ( 76 ) 90001-7 Google ScholarGoogle ScholarCross RefCross Ref
  20. Google, Inc. 2017. Recommended upload encoding settings. Google, Inc. Retrieved Feburary 13, 2021 from https://support.google.com/youtube/answer/1722171Google ScholarGoogle Scholar
  21. Adrian Grange, Peter de Rivaz, and Jack Haughton. 2016. Draft VP9 Bitstream and Decoding Process Specification. Google. Retrieved February 13, 2021 from https://www.webmproject.org/vp9/Google ScholarGoogle Scholar
  22. Dan Grois, Detlev Marpe, Amit Mulayof, Benaya Itzhaky, and Ofer Hadar. 2013. Performance comparison of H.265/MPEG-HEVC, VP9, and H. 264/MPEG-AVC encoders. In 2013 Picture Coding Symposium (PCS). IEEE, 394-397. https://doi. org/10.1109/PCS. 2013.6737766 Google ScholarGoogle ScholarCross RefCross Ref
  23. Kaiyuan Guo, Song Han, Song Yao, Yu Wang, Yuan Xie, and Huazhong Yang. 2017. Software-Hardware Codesign for Eficient Neural Network Acceleration. IEEE Micro 37, 2 ( 2017 ), 18-25. https://doi.org/10.1109/MM. 2017.39 Google ScholarGoogle ScholarCross RefCross Ref
  24. Liwei Guo, Jan De Cock, and Anne Aaron. 2018. Compression Performance Comparison of x264, x265, libvpx and aomenc for On-Demand Adaptive Streaming Applications. In 2018 Picture Coding Symposium (PCS). IEEE, 26-30. https: //doi.org/10.1109/PCS. 2018.8456302 Google ScholarGoogle ScholarCross RefCross Ref
  25. Lei Guo, Enhua Tan, Songqing Chen, Zhen Xiao, and Xiaodong Zhang. 2008. The Stretched Exponential Distribution of Internet Media Access Patterns. In Proceedings of the Twenty-Seventh ACM Symposium on Principles of Distributed Computing (PODC '08). Association for Computing Machinery, New York, NY, USA, 283-294. https://doi.org/10.1145/1400751.1400789 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. R. W. Hamming. 1950. Error detecting and error correcting codes. The Bell System Technical Journal 29, 2 ( 1950 ), 147-160. https://doi.org/10.1002/j.1538-7305. 1950.tb00463.x Google ScholarGoogle ScholarCross RefCross Ref
  27. John Hennessy and David Patterson. 2018. A new golden age for computer architecture: Domain-specific hardware/software co-design, enhanced security, open instruction sets, and agile chip development. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, 27-29. https: //doi.org/10.1109/ISCA. 2018.00011 Google ScholarGoogle ScholarCross RefCross Ref
  28. International Telecommunication Union 2019. H. 264 : Advanced Video Coding for generic audiovisual services. International Telecommunication Union. Retrieved February 13, 2021 from https://www.itu.int/rec/T-REC-H. 264-201906-I/enGoogle ScholarGoogle Scholar
  29. Jae-Won Suh and Yo-Sung Ho. 2002. Error concealment techniques for digital TV. IEEE Transactions on Broadcasting 48, 4 ( 2002 ), 299-306. https://doi.org/10. 1109/TBC. 2002.806797 Google ScholarGoogle ScholarCross RefCross Ref
  30. Norman P. Jouppi, Clif Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Cliford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jefrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jafey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA '17). Association for Computing Machinery, New York, NY, USA, 1-12. https://doi.org/10.1145/3079856.3080246 Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Svilen Kanev, Juan Pablo Darago, Kim Hazelwood, Parthasarathy Ranganathan, Tipp Moseley, Gu-Yeon Wei, and David Brooks. 2015. Profiling a WarehouseScale Computer. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA '15). Association for Computing Machinery, New York, NY, USA, 158-169. https://doi.org/10.1145/2749469.2750392 Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. David Karger, Eric Lehman, Tom Leighton, Rina Panigrahy, Matthew Levine, and Daniel Lewin. 1997. Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web. In Proceedings of the twenty-ninth annual ACM symposium on Theory of computing. Association for Computing Machinery, 654-663. https://doi.org/10.1145/258533.258660 Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Ioannis Katsavounidis. 2018. Dynamic optimizer-a perceptual video encoding optimization framework. Netflix. Retrieved August 19, 2020 from https://netflixtechblog.com /dynamic-optimizer-a-perceptual-videoencoding-optimization-framework-e19f1e3a277fGoogle ScholarGoogle Scholar
  34. Anil Kokaram, Thierry Foucu, and Yang Hu. 2016. A look into YouTube's video ifle anatomy. Google, Inc. https://www.googblogs. com/a-look-into-youtubesvideo-file-anatomy/Google ScholarGoogle Scholar
  35. Ramana Rao Kompella, Jennifer Yates, Albert Greenberg, and Alex C Snoeren. 2007. Detection and localization of network black holes. In IEEE INFOCOM 2007-26th IEEE International Conference on Computer Communications. IEEE, 2180-2188. https://doi.org/10.1109/INFCOM. 2007.252 Google ScholarGoogle ScholarCross RefCross Ref
  36. Jan Kufa and Tomas Kratochvil. 2017. Software and hardware HEVC encoding. In 2017 International Conference on Systems, Signals and Image Processing (IWSSIP). IEEE, 1-5. https://doi.org/10.1109/IWSSIP. 2017.7965585 Google ScholarGoogle ScholarCross RefCross Ref
  37. Kevin Lee and Vijay Rao. 2019. Accelerating Facebook's infrastructure with application-specific hardware. Facebook. Retrieved August 20, 2020 from https: //engineering.fb.com/data-center-engineering/accelerating-infrastructure/Google ScholarGoogle Scholar
  38. Daofu Liu, Tianshi Chen, Shaoli Liu, Jinhong Zhou, Shengyuan Zhou, Olivier Teman, Xiaobing Feng, Xuehai Zhou, and Yunji Chen. 2015. PuDianNao: A Polyvalent Machine Learning Accelerator. SIGPLAN Not. 50, 4 (March 2015 ), 369-381. https://doi.org/10.1145/2775054.2694358 Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Andrea Lottarini, Alex Ramirez, Joel Coburn, Martha A. Kim, Parthasarathy Ranganathan, Daniel Stodolsky, and Mark Wachsler. 2018. vbench: Benchmarking Video Transcoding in the Cloud. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '18). Association for Computing Machinery, New York, NY, USA, 797-809. https://doi.org/10.1145/3173162.3173207 Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Ikuo Magaki, Moein Khazraee, Luis Vega Gutierrez, and Michael Bedford Taylor. 2016. ASIC Clouds: Specializing the Datacenter. In Proceedings of the 43rd International Symposium on Computer Architecture (ISCA '16). IEEE Press, 178-190. https://doi.org/10.1109/ISCA. 2016.25 Google ScholarGoogle ScholarCross RefCross Ref
  41. Jason Mars and Lingjia Tang. 2013. Whare-Map: Heterogeneity in "Homogeneous" Warehouse-Scale Computers. In Proceedings of the 40th Annual International Symposium on Computer Architecture (ISCA '13). Association for Computing Machinery, New York, NY, USA, 619-630. https://doi.org/10.1145/2485922. 2485975 Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Debargha Mukherjee, Jim Bankoski, Adrian Grange, Jingning Han, John Koleszar, Paul Wilkins, Yaowu Xu, and Ronald Bultje. 2013. The latest open-source video codec VP9-An overview and preliminary results. In 2013 Picture Coding Symposium (PCS). IEEE, 390-393. https://doi.org/10.1109/PCS. 2013.6737765 Google ScholarGoogle ScholarCross RefCross Ref
  43. Ngoc-Mai Nguyen, Edith Beigne, Suzanne Lesecq, Duy-Hieu Bui, Nam-Khanh Dang, and Xuan-Tu Tran. 2014. H.264/ AVC hardware encoders and low-power features. In 2014 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS). IEEE, 77-80. https://doi.org/10.1109/APCCAS. 2014.7032723 Google ScholarGoogle ScholarCross RefCross Ref
  44. Antonio Ortega and Kannan Ramchandran. 1998. Rate-distortion methods for image and video compression. IEEE Signal Processing Magazine 15, 6 ( 1998 ), 23-50. https://doi.org/10.1109/79.733495 Google ScholarGoogle ScholarCross RefCross Ref
  45. Grzegorz Pastuszak. 2016. High-speed architecture of the CABAC probability modeling for H.265/HEVC encoders. In 2016 International Conference on Signals and Electronic Systems (ICSES). IEEE, 143-146. https://doi.org/10.1109/ICSES. 2016.7593839 Google ScholarGoogle ScholarCross RefCross Ref
  46. Francisco Romero and Christina Delimitrou. 2018. Mage: Online and InterferenceAware Scheduling for Multi-Scale Heterogeneous Systems. In Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques (PACT18). Association for Computing Machinery, Article 19, 13 pages. https: //doi.org/10.1145/3243176.3243183 Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Samsung 2018. Exynos 8895 Processor: Specs, Features. Samsung. Retrieved February 13, 2021 from https://www.samsung.com/semiconductor/minisite/exynos/ products/mobileprocessor/exynos-9-series-8895/Google ScholarGoogle Scholar
  48. Y. Sani, A. Mauthe, and C. Edwards. 2017. Adaptive Bitrate Selection: A Survey. IEEE Communications Surveys Tutorials 19, 4 ( 2017 ), 2985-3014. https://doi.org/ 10.1109/COMST. 2017.2725241 Google ScholarGoogle ScholarCross RefCross Ref
  49. H. Schwarz, T. Nguyen, D. Marpe, and T. Wiegand. 2019. Hybrid Video Coding with Trellis-Coded Quantization. In 2019 Data Compression Conference (DCC). IEEE, 182-191. https://doi.org/10.1109/DCC. 2019.00026 Google ScholarGoogle ScholarCross RefCross Ref
  50. Konstantin Serebryany, Derek Bruening, Alexander Potapenko, and Dmitry Vyukov. 2012. AddressSanitizer: A Fast Address Sanity Checker. In Proceedings of the 2012 USENIX Conference on Annual Technical Conference (USENIX ATC'12). USENIX Association, USA, 28.Google ScholarGoogle Scholar
  51. Daniel Shelepov, Juan Carlos Saez Alcaide, Stacey Jefery, Alexandra Fedorova, Nestor Perez, Zhi Feng Huang, Sergey Blagodurov, and Viren Kumar. 2009. HASS: A Scheduler for Heterogeneous Multicore Systems. SIGOPS Oper. Syst. Rev. 43, 2 (April 2009 ), 66-75. https://doi.org/10.1145/1531793.1531804 Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Siemens Digital Industries Software 2021. Catapult High-Level Synthesis. Siemens Digital Industries Software. Retrieved Feburary 13, 2021 from https://www. mentor.com/hls-lp/ catapult-high-level-synthesisGoogle ScholarGoogle Scholar
  53. Akshitha Sriraman and Abhishek Dhanotia. 2020. Accelerometer: Understanding Acceleration Opportunities for Data Center Overheads at Hyperscale. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '20). Association for Computing Machinery, New York, NY, USA, 733-750. https://doi.org/10.1145/ 3373376.3378450 Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Evgeniy Stepanov and Konstantin Serebryany. 2015. MemorySanitizer: Fast Detector of Uninitialized Memory Use in C++. In Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO '15). IEEE Computer Society, USA, 46-55. https://doi.org/10.1109/CGO. 2015. 7054186 Google ScholarGoogle ScholarCross RefCross Ref
  55. Gary J. Sullivan and Thomas Wiegand. 2005. Video Compression-From Concepts to the H.264/AVC Standard. Proc. IEEE 93, 1 ( 2005 ), 18-31. https://doi.org/10. 1109/JPROC. 2004.839617 Google ScholarGoogle ScholarCross RefCross Ref
  56. A. Takach. 2016. High-Level Synthesis: Status, Trends, and Future Directions. IEEE Design & Test 33, 3 ( 2016 ), 116-124. https://doi.org/10.1109/MDAT. 2016.2544850 Google ScholarGoogle ScholarCross RefCross Ref
  57. Tung-Chien Chen, Chung-Jr Lian, and Liang-Gee Chen. 2006. Hardware architecture design of an H.264/AVC video codec. In Asia and South Pacific Conference on Design Automation, 2006. IEEE, 8 pp.-. https://doi.org/10.1109/ASPDAC. 2006. 1594776 Google ScholarGoogle ScholarCross RefCross Ref
  58. K. Van Craeynest, A. Jaleel, L. Eeckhout, P. Narvaez, and J. Emer. 2012. Scheduling heterogeneous multi-cores through performance impact estimation (PIE). In 2012 39th Annual International Symposium on Computer Architecture (ISCA). IEEE, 213-224. https://doi.org/10.1109/ISCA. 2012.6237019 Google ScholarGoogle ScholarCross RefCross Ref
  59. Abhishek Verma, Luis Pedrosa, Madhukar R. Korupolu, David Oppenheimer, Eric Tune, and John Wilkes. 2015. Large-scale cluster management at Google with Borg. In Proceedings of the European Conference on Computer Systems (EuroSys). Association for Computing Machinery, Bordeaux, France, Article 18, 17 pages. https://doi.org/10.1145/2741948.2741964 Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. K. Wei, S. Zhang, H. Jia, D. Xie, and W. Gao. 2012. A flexible and high-performance hardware video encoder architecture. In 2012 Picture Coding Symposium. IEEE, 373-376. https://doi.org/10.1109/PCS. 2012.6213368 Google ScholarGoogle ScholarCross RefCross Ref
  61. P. H. Westerink, R. Rajagopalan, and C. A. Gonzales. 1999. Two-pass MPEG-2 variable-bit-rate encoding. IBM Journal of Research and Development 43, 4 ( 1999 ), 471-488. https://doi.org/10.1147/rd.434.0471 Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. M. A. Wilhelmsen, H. K. Stensland, V. R. Gaddam, A. Mortensen, R. Langseth, C. Griwodz, and P. Halvorsen. 2014. Using a Commodity Hardware Video Encoder for Interactive Video Streaming. In 2014 IEEE International Symposium on Multimedia. IEEE, 251-254. https://doi.org/10.1109/ISM. 2014.58 Google ScholarGoogle ScholarCross RefCross Ref
  63. Yaowu Xu. 2010. Inside WebM Technology: The VP8 Alternate Reference Frame. Google, Inc. Retrieved Feburary 13, 2021 from http://blog.webmproject.org/ 2010 / 05/inside-webm-technology-vp8-alternate.htmlGoogle ScholarGoogle Scholar
  64. Xuan Yang, Mingyu Gao, Qiaoyi Liu, Jef Setter, Jing Pu, Ankita Nayak, Steven Bell, Kaidi Cao, Heonjae Ha, Priyanka Raina, Christos Kozyrakis, and Mark Horowitz. 2020. Interstellar: Using Halide's Scheduling Language to Analyze DNN Accelerators. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '20). Association for Computing Machinery, New York, NY, USA, 369-383. https://doi.org/10.1145/3373376.3378514 Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Yu-Wen Huang, Bing-Yu Hsieh, Tung-Chien Chen, and Liang-Gee Chen. 2005. Analysis, fast algorithm, and VLSI architecture design for H.264/AVC intra frame coder. IEEE Transactions on Circuits and Systems for Video Technology 15, 3 ( 2005 ), 378-401. https://doi.org/10.1109/TCSVT. 2004.842620 Google ScholarGoogle ScholarCross RefCross Ref
  66. Whitney Zhao, Tifany Jin, Cheng Chen, Siamak Taveallaei, and Zhenghui Wu. 2019. OCP Accelerator Module Design Specification. Open Compute Project. Retrieved February 13, 2021 from https://www.opencompute.org/documents/ocpaccelerator-module-design-specification-v1p0-3-pdfGoogle ScholarGoogle Scholar

Index Terms

  1. Warehouse-scale video acceleration: co-design and deployment in the wild

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ASPLOS '21: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems
        April 2021
        1090 pages
        ISBN:9781450383172
        DOI:10.1145/3445814

        Copyright © 2021 Owner/Author

        This work is licensed under a Creative Commons Attribution International 4.0 License.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 17 April 2021

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate535of2,713submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader