Google Scholar

VJ Reddi, C Cheng, D Kanter, P Mattson… - 2020 ACM/IEEE 47th …, 2020 - ieeexplore.ieee.org

Machine-learning (ML) hardware and software system demand is burgeoning. Driven by ML
applications, the number of different ML inference systems has exploded. Over 100 …

Save Cite Cited by 460 Related articles All 11 versions

[PDF] arxiv.org

Understanding training efficiency of deep learning recommendation models at scale

B Acun, M Murphy, X Wang, J Nie… - … Symposium on High …, 2021 - ieeexplore.ieee.org

The use of GPUs has proliferated for machine learning workflows and is now considered
mainstream for many deep learning models. Meanwhile, when training state-of-the-art …

Save Cite Cited by 90 Related articles All 6 versions

[PDF] github.io

Enabling rack-scale confidential computing using heterogeneous trusted execution environment

J Zhu, R Hou, XF Wang, W Wang, J Cao… - … IEEE Symposium on …, 2020 - ieeexplore.ieee.org

With its huge real-world demands, large-scale confidential computing still cannot be
supported by today's Trusted Execution Environment (TEE), due to the lack of scalable and …

Save Cite Cited by 78 Related articles All 6 versions

[PDF] arxiv.org

Towards fast and scalable private inference

J Mo, K Garimella, N Neda, A Ebel… - Proceedings of the 20th …, 2023 - dl.acm.org

Privacy and security have rapidly emerged as first order design constraints. Users now
demand more protection over who can see their data (confidentiality) as well as how it is …

Save Cite Cited by 3 Related articles All 4 versions

[PDF] github.io

Warehouse-scale video acceleration: co-design and deployment in the wild

P Ranganathan, D Stodolsky, J Calow… - Proceedings of the 26th …, 2021 - dl.acm.org

Video sharing (eg, YouTube, Vimeo, Facebook, TikTok) accounts for the majority of internet
traffic, and video processing is also foundational to several other key workloads (video …

Save Cite Cited by 37 Related articles All 3 versions

[PDF] arxiv.org

First-generation inference accelerator deployment at facebook

M Anderson, B Chen, S Chen, S Deng, J Fix… - arXiv preprint arXiv …, 2021 - arxiv.org

In this paper, we provide a deep dive into the deployment of inference accelerators at
Facebook. Many of our ML workloads have unique characteristics, such as sparse memory …

Save Cite Cited by 33 Related articles All 2 versions View as HTML

[PDF] arxiv.org

Characterizing and optimizing end-to-end systems for private inference

K Garimella, Z Ghodsi, NK Jha, S Garg… - Proceedings of the 28th …, 2023 - dl.acm.org

In two-party machine learning prediction services, the client's goal is to query a remote
server's trained machine learning model to perform neural network inference in some …

Save Cite Cited by 19 Related articles All 6 versions

[PDF] arxiv.org

Hercules: Heterogeneity-aware inference serving for at-scale personalized recommendation

L Ke, U Gupta, M Hempstead, CJ Wu… - … Symposium on High …, 2022 - ieeexplore.ieee.org

Personalized recommendation is an important class of deep-learning applications that
powers a large collection of internet services and consumes a considerable amount of …

Save Cite Cited by 19 Related articles All 9 versions

[PDF] aaai.org

Similarity search for efficient active learning and search of rare concepts

C Coleman, E Chou, J Katz-Samuels… - Proceedings of the …, 2022 - ojs.aaai.org

Many active learning and search approaches are intractable for large-scale industrial
settings with billions of unlabeled examples. Existing approaches search globally for the …

Save Cite Cited by 30 Related articles All 8 versions View as HTML

[PDF] arxiv.org

One Transform To Compute Them All: Efficient Fusion-Based Full-Reference Video Quality Assessment

AK Venkataramanan, C Stejerean… - … on Image Processing, 2023 - ieeexplore.ieee.org

The Visual Multimethod Assessment Fusion (VMAF) algorithm has recently emerged as a
state-of-the-art approach to video quality prediction, that now pervades the streaming and …

Save Cite Cited by 4 Related articles All 6 versions

Create alert

Cite

Advanced search

Saved to My library

Accelerating facebook’s infrastructure with application-specific hardware

Mlperf inference benchmark

Understanding training efficiency of deep learning recommendation models at scale

Enabling rack-scale confidential computing using heterogeneous trusted execution environment

Towards fast and scalable private inference

Warehouse-scale video acceleration: co-design and deployment in the wild

First-generation inference accelerator deployment at facebook

Characterizing and optimizing end-to-end systems for private inference

Hercules: Heterogeneity-aware inference serving for at-scale personalized recommendation

Similarity search for efficient active learning and search of rare concepts

One Transform To Compute Them All: Efficient Fusion-Based Full-Reference Video Quality Assessment