Building a Large-Scale Recommendation System: People You May Know

LinkedIn’s “People You May Know” (PYMK) feature has long been used by our members to form connections with other members and expand their networks. It’s an essential part of how we fulfill our mission of connecting the world’s professionals to make them more productive and successful. There are many reasons why a member might find a connection valuable - from finding a professional mentor, to networking with a future employer, to reestablishing a connection with a peer from school. These various motivations for connection, combined with our more than one billion members, mean that selecting a candidate pool to display for PYMK is a monumental task. Today, PYMK processes hundreds of terabytes of data and hundreds of billions of potential connections daily to recommend other members you may want to connect with.

The main challenge is that it’s impossible to sift through the entire candidate inventory to generate recommendations within a reasonable time frame . Methods such as Negative Sampling, Adaptive Importance Sampling, and Hierarchical Softmax can help scale the training process, but inference (i.e., scoring) remains a challenge with such a large item inventory. This is due not only to the size of the candidate pool, but also a result of the multiple factors we’re optimizing for in creating the final PYMK list, such as the likelihood of you sending an invitation and that invitation being accepted.


Diagram of a general multi-stage ranking system
Figure 1: A general multi-stage ranking system, where each stage acts as a funnel, reducing the search space for the next stage

In this blog, we cover how we built our large-scale recommendation system and scaled its scoring mechanism over the last two years to handle more than a billion items while still ensuring high relevance and low serving latency in the recommendations shared with members. We discuss the core design behind PYMK: a multi-stage ranking system with clearly defined stages serving several goals and leveraging varied algorithms. Each stage acts as a funnel, reducing the search space for the next stage. The end result is a final recommendation pool that balances multiple candidate sources, the likelihood of a connection bringing value, and fairness (among other parameters) to create the best possible PYMK experience for both inviters and invitees.

Candidate Generation (L0 Ranking)

The main purpose of our L0 Ranking is to select a few thousand candidates from an inventory of billions of items. The purpose is not to rank the most relevant candidates at the top, but rather to ensure that the most relevant candidates are selected. This makes Recall@k the right metric to evaluate this stage.

The stage consists of multiple candidate generation (CG) sources. These range from graph based CG sources that generate candidates by performing random graph walks (e.g., n-hop neighbors) to embedding based retrieval (EBR) sources that generate candidates via similarity scores to simple heuristic sources (e.g., new LinkedIn members in your geographic area).

Light Ranker (L1 Ranking)

The L1 Ranking stage takes the few thousand candidates generated by the L0 Ranking above, calibrates and ranks them against a common objective, and then reduces them to a select few hundred of the most relevant candidates. Because the L0 Ranking has multiple CG sources—which could be generating very diverse candidates (e.g., graph based versus similarity based)—calibration is an important part of this stage to make these diverse candidates comparable. A lightweight model like logistic regression or XGBoost could be used for calibration.

This stage also uses Recall@k as the primary evaluation metric, but k is now in the 500-800 range (In the L0 ranking stage, k was in the 3,000-5,000 range).

Rich Ranker (L2 Ranking)

The goal of this stage is to rank the most relevant candidate and then further reduce the candidate pool based on this ranking. The L2 Ranking stage consists of multiple heavy models that predict the probability and the value of different engagement events (i.e., invitations sent, invitations accepted, etc.). The models are usually deep neural networks consuming the most powerful member-candidate pair features.

High-precision metrics like AUC and Precision@k are used in this stage. Usually, the output scores of the models in this stage are used in subsequent re-ranking or even in the models of other teams, so metrics like ECE (expected calibration error) are also used.

Re-Ranker

This is the final stage, which consists of multiple re-rankers, such as fairness re-rankers to ensure fairness in terms of protected attributes like gender and age, diversity re-rankers to help ensure varied interests and intents are expressed in the recommendations, and re-rankers to avoid outcomes like overrepresenting platform power users.

This stage also employs Bayesian optimization to estimate the most important parameters, since there are multiple objectives we’re optimizing for in PYMK candidate selection. The multiple models predicting different engagement events in the L2 Ranking are linearly combined using weights. These weights are one of the most important parameters of the system and are estimated in the re-ranking stage via Bayesian optimization techniques.

Diagram of PYMK’s multi-stage ranking system
Figure 2: PYMK’s multi-stage ranking system. PYMK has 3 categories of candidate generation: graph-based, similarity-based, and heuristic-based. An XGBoost light ranker follows this. After that, PYMK has multiple neural network based rankers that estimate the probability of an invitation being sent, an invitation being accepted, and the like. At the final stage, there are re-rankers performing Bayesian optimization and ensuring fairness and diversity

Online Evaluation

As mentioned, each stage is evaluated via its corresponding offline metrics (L0 and L1 Ranking stages via Recall@k, L2 Ranking stage via AUC/Precision@k, Re-Ranking stage via log-likelihood and diversity metrics). To evaluate the entire system, we rely on A/B tests, which also tell us the online performance of our system. Offline metrics rarely match the online performance measures, which could be due to multiple factors:

  1. Presentation biases (missing profile photo, laggy UI, position bias, etc.).
  2. Unintentional errors in deploying models to production.
  3. Discrepancy between the distribution of offline training data and online data.

Due to this discrepancy between offline and online evaluation, we trust online evaluation via A/B tests to judge our PYMK system’s performance. The launch of multi-stage ranking for PYMK delivered some of the biggest improvements in member engagement and retention in the past 6 years and still continues to do so.

Future Work

Such a complex system continues to pose challenges in terms of maintainability and monitoring. Additionally, there’s the question of how tightly each of the stages should be coupled—while tight coupling increases the accuracy of the overall system, it slows down the development speed. Tackling feedback loops in a multi-stage system is another big challenge that hasn’t been systematically explored by ML practitioners yet. 

Our teams continue the journey of solving these challenges and adopting new techniques to reach their end goal of building a high quality and highly functional large-scale recommendation system to help members connect with other members and have productive conversations. 

Acknowledgements

It takes a lot of talent and dedication to build the AI systems that drive our mission. We are grateful to the following team members for their contributions in building a large scale recommendation system.

PYMK AI: Yafei Wang, Divya Venugopalan, Ankan Saha, Siyuan Gao, Xinpei Ma, Yangzhan Yang, Yong Hu, Ayan Acharya, Yiou Xiao, and John Liu.

Graph Infra: Juan A. Colmenares.

FAIT: Sathiya Keerthi Selvaraj, Haichao Wei, Tie Wang, Xiaobing Xue, Chengming Jiang, Sen Zhou, and QQ Song.

Growth: Netra Malagi, Chiachi Lo, Nishant Satya Lakshmikanth, Jugpreet Singh Talwar, Xukai Wang, Pratham Alag, Caitlin Crump, Amber Jin, Jenny Jiang and Albert Cui.

Special thanks to the leadership: Shipeng Yu, Tim Jurka, Bobby Nakamoto, Naman Goel, Souvik Ghosh, and Necip Fazil for their instrumental support, and to Benito Leyva, Will Cheng, Jon Adams, Dina To, Katherine Vaiente, Sathiya Keerthi Selvaraj, Yafei Wang, and Chunan Zeng for helping us improve the quality of this post.