Mlperf inference benchmark
Machine-learning (ML) hardware and software system demand is burgeoning. Driven by ML
applications, the number of different ML inference systems has exploded. Over 100 …
applications, the number of different ML inference systems has exploded. Over 100 …
Understanding training efficiency of deep learning recommendation models at scale
The use of GPUs has proliferated for machine learning workflows and is now considered
mainstream for many deep learning models. Meanwhile, when training state-of-the-art …
mainstream for many deep learning models. Meanwhile, when training state-of-the-art …
Enabling rack-scale confidential computing using heterogeneous trusted execution environment
With its huge real-world demands, large-scale confidential computing still cannot be
supported by today's Trusted Execution Environment (TEE), due to the lack of scalable and …
supported by today's Trusted Execution Environment (TEE), due to the lack of scalable and …
Towards fast and scalable private inference
Privacy and security have rapidly emerged as first order design constraints. Users now
demand more protection over who can see their data (confidentiality) as well as how it is …
demand more protection over who can see their data (confidentiality) as well as how it is …
Warehouse-scale video acceleration: co-design and deployment in the wild
P Ranganathan, D Stodolsky, J Calow… - Proceedings of the 26th …, 2021 - dl.acm.org
Video sharing (eg, YouTube, Vimeo, Facebook, TikTok) accounts for the majority of internet
traffic, and video processing is also foundational to several other key workloads (video …
traffic, and video processing is also foundational to several other key workloads (video …
First-generation inference accelerator deployment at facebook
M Anderson, B Chen, S Chen, S Deng, J Fix… - arXiv preprint arXiv …, 2021 - arxiv.org
In this paper, we provide a deep dive into the deployment of inference accelerators at
Facebook. Many of our ML workloads have unique characteristics, such as sparse memory …
Facebook. Many of our ML workloads have unique characteristics, such as sparse memory …
Characterizing and optimizing end-to-end systems for private inference
In two-party machine learning prediction services, the client's goal is to query a remote
server's trained machine learning model to perform neural network inference in some …
server's trained machine learning model to perform neural network inference in some …
Hercules: Heterogeneity-aware inference serving for at-scale personalized recommendation
Personalized recommendation is an important class of deep-learning applications that
powers a large collection of internet services and consumes a considerable amount of …
powers a large collection of internet services and consumes a considerable amount of …
Similarity search for efficient active learning and search of rare concepts
Many active learning and search approaches are intractable for large-scale industrial
settings with billions of unlabeled examples. Existing approaches search globally for the …
settings with billions of unlabeled examples. Existing approaches search globally for the …
One Transform To Compute Them All: Efficient Fusion-Based Full-Reference Video Quality Assessment
AK Venkataramanan, C Stejerean… - … on Image Processing, 2023 - ieeexplore.ieee.org
The Visual Multimethod Assessment Fusion (VMAF) algorithm has recently emerged as a
state-of-the-art approach to video quality prediction, that now pervades the streaming and …
state-of-the-art approach to video quality prediction, that now pervades the streaming and …