Mlperf inference benchmark

VJ Reddi, C Cheng, D Kanter, P Mattson… - 2020 ACM/IEEE 47th …, 2020 - ieeexplore.ieee.org
Machine-learning (ML) hardware and software system demand is burgeoning. Driven by ML
applications, the number of different ML inference systems has exploded. Over 100 …

Understanding training efficiency of deep learning recommendation models at scale

B Acun, M Murphy, X Wang, J Nie… - … Symposium on High …, 2021 - ieeexplore.ieee.org
The use of GPUs has proliferated for machine learning workflows and is now considered
mainstream for many deep learning models. Meanwhile, when training state-of-the-art …

Enabling rack-scale confidential computing using heterogeneous trusted execution environment

J Zhu, R Hou, XF Wang, W Wang, J Cao… - … IEEE Symposium on …, 2020 - ieeexplore.ieee.org
With its huge real-world demands, large-scale confidential computing still cannot be
supported by today's Trusted Execution Environment (TEE), due to the lack of scalable and …

Towards fast and scalable private inference

J Mo, K Garimella, N Neda, A Ebel… - Proceedings of the 20th …, 2023 - dl.acm.org
Privacy and security have rapidly emerged as first order design constraints. Users now
demand more protection over who can see their data (confidentiality) as well as how it is …

Warehouse-scale video acceleration: co-design and deployment in the wild

P Ranganathan, D Stodolsky, J Calow… - Proceedings of the 26th …, 2021 - dl.acm.org
Video sharing (eg, YouTube, Vimeo, Facebook, TikTok) accounts for the majority of internet
traffic, and video processing is also foundational to several other key workloads (video …

First-generation inference accelerator deployment at facebook

M Anderson, B Chen, S Chen, S Deng, J Fix… - arXiv preprint arXiv …, 2021 - arxiv.org
In this paper, we provide a deep dive into the deployment of inference accelerators at
Facebook. Many of our ML workloads have unique characteristics, such as sparse memory …

Characterizing and optimizing end-to-end systems for private inference

K Garimella, Z Ghodsi, NK Jha, S Garg… - Proceedings of the 28th …, 2023 - dl.acm.org
In two-party machine learning prediction services, the client's goal is to query a remote
server's trained machine learning model to perform neural network inference in some …

Hercules: Heterogeneity-aware inference serving for at-scale personalized recommendation

L Ke, U Gupta, M Hempstead, CJ Wu… - … Symposium on High …, 2022 - ieeexplore.ieee.org
Personalized recommendation is an important class of deep-learning applications that
powers a large collection of internet services and consumes a considerable amount of …

Similarity search for efficient active learning and search of rare concepts

C Coleman, E Chou, J Katz-Samuels… - Proceedings of the …, 2022 - ojs.aaai.org
Many active learning and search approaches are intractable for large-scale industrial
settings with billions of unlabeled examples. Existing approaches search globally for the …

One Transform To Compute Them All: Efficient Fusion-Based Full-Reference Video Quality Assessment

AK Venkataramanan, C Stejerean… - … on Image Processing, 2023 - ieeexplore.ieee.org
The Visual Multimethod Assessment Fusion (VMAF) algorithm has recently emerged as a
state-of-the-art approach to video quality prediction, that now pervades the streaming and …