skip to main content
10.1145/2939672.2939718acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

CaSMoS: A Framework for Learning Candidate Selection Models over Structured Queries and Documents

Published:13 August 2016Publication History

ABSTRACT

User experience at social media and web platforms such as LinkedIn is heavily dependent on the performance and scalability of its products. Applications such as personalized search and recommendations require real-time scoring of millions of structured candidate documents associated with each query, with strict latency constraints. In such applications, the query incorporates the context of the user (in addition to search keywords if present), and hence can become very large, comprising of thousands of Boolean clauses over hundreds of document attributes. Consequently, candidate selection techniques need to be applied since it is infeasible to retrieve and score all matching documents from the underlying inverted index. We propose CaSMoS, a machine learned candidate selection framework that makes use of Weighted AND (WAND) query. Our framework is designed to prune irrelevant documents and retrieve documents that are likely to be part of the top-k results for the query. We apply a constrained feature selection algorithm to learn positive weights for feature combinations that are used as part of the weighted candidate selection query. We have implemented and deployed this system to be executed in real time using LinkedIn's Galene search platform. We perform extensive evaluation with different training data approaches and parameter settings, and investigate the scalability of the proposed candidate selection model. Our deployment of this system as part of LinkedIn's job recommendation engine has resulted in significant reduction in latency (up to 25%) without sacrificing the quality of the retrieved results, thereby paving the way for more sophisticated scoring models.

References

  1. Apache Kafka. http://kafka.apache.org/.Google ScholarGoogle Scholar
  2. G. Adomavicius and A. Tuzhilin. Context-aware recommender systems. In Recommender systems handbook. Springer, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  3. E. Al Mashagba, F. Al Mashagba, and M. O. Nassar. Query optimization using genetic algorithms in the vector space model. International Journal of Computer Science Issues (IJCSI), 8(5), 2011.Google ScholarGoogle Scholar
  4. A. Anagnostopoulos, A. Z. Broder, and K. Punera. Effective and efficient classification on a search-engine model. In CIKM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Y. Aphinyanaphongs and C. Aliferis. Learning Boolean queries for article quality filtering. In MEDINFO, 2004.Google ScholarGoogle Scholar
  6. N. Asadi and J. Lin. Effectiveness/efficiency tradeoffs for candidate generation in multi-stage retrieval architectures. In SIGIR, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Bazire and P. Brézillon. Understanding context before using it. In CONTEXT, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Bilenko, B. Kamath, and R. J. Mooney. Adaptive blocking: Learning to scale up record linkage. In ICDM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Z. Broder, D. Carmel, M. Herscovici, A. Soffer, and J. Zien. Efficient query evaluation using a two-level retrieval process. In CIKM, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Chaudhuri. An overview of query optimization in relational systems. In PODS, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. Dourish. What we talk about when we talk about context. Personal and ubiquitous computing, 8(1), 2004.Google ScholarGoogle Scholar
  12. G. W. Flake, E. J. Glover, S. Lawrence, and C. L. Giles. Extracting query modifications from nonlinear SVMs. In WWW, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J.-T. Horng and C.-C. Yeh. Applying genetic algorithms to query optimization in document retrieval. Information processing & management, 36(5), 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. N. McNeill, H. Kardes, and A. Borthwick. Dynamic record blocking: Efficient linking of massive databases in MapReduce. In QDB, 2012.Google ScholarGoogle Scholar
  15. S. Sriram and A. Makhani. LinkedIn's Galene Search engine, 2014. https://engineering.linkedin.com/search/did-you-mean-galene.Google ScholarGoogle Scholar
  16. N. Tonellotto, C. Macdonald, and I. Ounis. Efficient and effective retrieval using selective pruning. In WSDM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Y. Xu, N. Chen, A. Fernandez, O. Sinno, and A. Bhasin. From infrastructure to culture: A/B testing challenges in large scale social networks. In KDD, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. CaSMoS: A Framework for Learning Candidate Selection Models over Structured Queries and Documents

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
          August 2016
          2176 pages
          ISBN:9781450342322
          DOI:10.1145/2939672

          Copyright © 2016 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 13 August 2016

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          KDD '16 Paper Acceptance Rate66of1,115submissions,6%Overall Acceptance Rate1,133of8,635submissions,13%

          Upcoming Conference

          KDD '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader