ABSTRACT
User experience at social media and web platforms such as LinkedIn is heavily dependent on the performance and scalability of its products. Applications such as personalized search and recommendations require real-time scoring of millions of structured candidate documents associated with each query, with strict latency constraints. In such applications, the query incorporates the context of the user (in addition to search keywords if present), and hence can become very large, comprising of thousands of Boolean clauses over hundreds of document attributes. Consequently, candidate selection techniques need to be applied since it is infeasible to retrieve and score all matching documents from the underlying inverted index. We propose CaSMoS, a machine learned candidate selection framework that makes use of Weighted AND (WAND) query. Our framework is designed to prune irrelevant documents and retrieve documents that are likely to be part of the top-k results for the query. We apply a constrained feature selection algorithm to learn positive weights for feature combinations that are used as part of the weighted candidate selection query. We have implemented and deployed this system to be executed in real time using LinkedIn's Galene search platform. We perform extensive evaluation with different training data approaches and parameter settings, and investigate the scalability of the proposed candidate selection model. Our deployment of this system as part of LinkedIn's job recommendation engine has resulted in significant reduction in latency (up to 25%) without sacrificing the quality of the retrieved results, thereby paving the way for more sophisticated scoring models.
- Apache Kafka. http://kafka.apache.org/.Google Scholar
- G. Adomavicius and A. Tuzhilin. Context-aware recommender systems. In Recommender systems handbook. Springer, 2015.Google ScholarCross Ref
- E. Al Mashagba, F. Al Mashagba, and M. O. Nassar. Query optimization using genetic algorithms in the vector space model. International Journal of Computer Science Issues (IJCSI), 8(5), 2011.Google Scholar
- A. Anagnostopoulos, A. Z. Broder, and K. Punera. Effective and efficient classification on a search-engine model. In CIKM, 2006. Google ScholarDigital Library
- Y. Aphinyanaphongs and C. Aliferis. Learning Boolean queries for article quality filtering. In MEDINFO, 2004.Google Scholar
- N. Asadi and J. Lin. Effectiveness/efficiency tradeoffs for candidate generation in multi-stage retrieval architectures. In SIGIR, 2013. Google ScholarDigital Library
- M. Bazire and P. Brézillon. Understanding context before using it. In CONTEXT, 2005. Google ScholarDigital Library
- M. Bilenko, B. Kamath, and R. J. Mooney. Adaptive blocking: Learning to scale up record linkage. In ICDM, 2006. Google ScholarDigital Library
- A. Z. Broder, D. Carmel, M. Herscovici, A. Soffer, and J. Zien. Efficient query evaluation using a two-level retrieval process. In CIKM, 2003. Google ScholarDigital Library
- S. Chaudhuri. An overview of query optimization in relational systems. In PODS, 1998. Google ScholarDigital Library
- P. Dourish. What we talk about when we talk about context. Personal and ubiquitous computing, 8(1), 2004.Google Scholar
- G. W. Flake, E. J. Glover, S. Lawrence, and C. L. Giles. Extracting query modifications from nonlinear SVMs. In WWW, 2002. Google ScholarDigital Library
- J.-T. Horng and C.-C. Yeh. Applying genetic algorithms to query optimization in document retrieval. Information processing & management, 36(5), 2000. Google ScholarDigital Library
- N. McNeill, H. Kardes, and A. Borthwick. Dynamic record blocking: Efficient linking of massive databases in MapReduce. In QDB, 2012.Google Scholar
- S. Sriram and A. Makhani. LinkedIn's Galene Search engine, 2014. https://engineering.linkedin.com/search/did-you-mean-galene.Google Scholar
- N. Tonellotto, C. Macdonald, and I. Ounis. Efficient and effective retrieval using selective pruning. In WSDM, 2013. Google ScholarDigital Library
- Y. Xu, N. Chen, A. Fernandez, O. Sinno, and A. Bhasin. From infrastructure to culture: A/B testing challenges in large scale social networks. In KDD, 2015. Google ScholarDigital Library
Index Terms
- CaSMoS: A Framework for Learning Candidate Selection Models over Structured Queries and Documents
Recommendations
LiJAR: A System for Job Application Redistribution towards Efficient Career Marketplace
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningOnline professional social networks such as LinkedIn serve as a marketplace, wherein job seekers can find right career opportunities and job providers can reach out to potential candidates. LinkedIn's job recommendations product is a key vehicle for ...
On the number of candidates in opportunistic routing for multi-hop wireless networks
MobiWac '13: Proceedings of the 11th ACM international symposium on Mobility management and wireless accessOpportunistic Routing (OR) is a new paradigm that has been investigated as a new way to improve the performance of multihop wireless networks by exploiting the broadcast nature of the wireless medium. In contrast to traditional routing, in OR an ordered ...
Candidate Selection for Large Scale Personalized Search and Recommender Systems
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information RetrievalModern day social media search and recommender systems require complex query formulation that incorporates both user context and their explicit search queries. Users expect these systems to be fast and provide relevant results to their query and ...
Comments