skip to main content
research-article

Diversifying Query Auto-Completion

Authors Info & Claims
Published:09 June 2016Publication History
Skip Abstract Section

Abstract

Query auto-completion assists web search users in formulating queries with a few keystrokes, helping them to avoid spelling mistakes and to produce clear query expressions, and so on. Previous work on query auto-completion mainly centers around returning a list of completions to users, aiming to push queries that are most likely intended by the user to the top positions but ignoring the redundancy among the query candidates in the list. Thus, semantically related queries matching the input prefix are often returned together. This may push valuable suggestions out of the list, given that only a limited number of candidates can be shown to the user, which may result in a less than optimal search experience.

In this article, we consider the task of diversifying query auto-completion, which aims to return the correct query completions early in a ranked list of candidate completions and at the same time reduce the redundancy among query auto-completion candidates. We develop a greedy query selection approach that predicts query completions based on the current search popularity of candidate completions and on the aspects of previous queries in the same search session. The popularity of completion candidates at query time can be directly aggregated from query logs. However, query aspects are implicitly expressed by previous clicked documents in the search context. To determine the query aspect, we categorize clicked documents of a query using a hierarchy based on the open directory project. Bayesian probabilistic matrix factorization is applied to derive the distribution of queries over all aspects. We quantify the improvement of our greedy query selection model against a state-of-the-art baseline using two large-scale, real-world query logs and show that it beats the baseline in terms of well-known metrics used in query auto-completion and diversification. In addition, we conduct a side-by-side experiment to verify the effectiveness of our proposal.

References

  1. Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, and Samuel Ieong. 2009. Diversifying search results. In WSDM’09. ACM, New York, NY, 5--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Kevin Bache, David Newman, and Padhraic Smyth. 2013. Text-based measures of document diversity. In KDD’13. ACM, New York, NY, 23--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Ziv Bar-Yossef and Naama Kraus. 2011. Context-sensitive query auto-completion. In WWW’11. ACM, New York, NY, 107--116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Paul N. Bennett, Ryen W. White, Wei Chu, Susan T. Dumais, Peter Bailey, Fedor Borisyuk, and Xiaoyuan Cui. 2012. Modeling the impact of short- and long-term behavior on search personalization. In SIGIR’12. ACM, New York, NY, 185--194. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Danushka Bollegala, Yutaka Matsuo, and Mitsuru Ishizuka. 2007. Measuring semantic similarity between words using web search engines. In WWW’07. ACM, New York, NY, 757--766. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Fei Cai and Maarten de Rijke. 2016a. Learning from homologous queries and semantically related terms for query auto completion. Information Processing and Management 52, 4, 628--643. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Fei Cai and Maarten de Rijke. 2016b. Query auto completion in information retrieval. Foundations and Trends in Information Retrieval Submitted.Google ScholarGoogle Scholar
  8. Fei Cai and Maarten de Rijke. 2016c. Selectively personalizing query auto-completion. In SIGIR’16. ACM, New York, NY, To appear.Google ScholarGoogle Scholar
  9. Fei Cai, Shangsong Liang, and Maarten de Rijke. 2014a. Personalized document re-ranking based on Bayesian probabilistic matrix factorization. In SIGIR’14. ACM, New York, NY, 835--838. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Fei Cai, Shangsong Liang, and Maarten de Rijke. 2014b. Time-sensitive personalized query auto-completion. In CIKM’14. ACM, New York, NY, 1599--1608. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Fei Cai, Shangsong Liang, and Maarten de Rijke. 2016a. Prefix-adaptive and time-sensitive personalized query auto completion. IEEE Transactions on Knowledge and Data Engineering To appear.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Fei Cai, Shuaiqiang Wang, and Maarten de Rijke. 2016b. Behavior-based personalization in web search. Journal of the Association for Information Science and Technology To appear.Google ScholarGoogle Scholar
  13. Jaime Carbonell and Jade Goldstein. 1998. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In SIGIR’98. ACM, New York, NY, 335--336. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Olivier Chapelle, Thorsten Joachims, Filip Radlinski, and Yisong Yue. 2012. Large-scale validation and analysis of interleaved search evaluation. ACM Transactions on Information Systems 30, 1, 6:1--6:41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Olivier Chapelle, Donald Metzler, Ya Zhang, and Pierre Grinspan. 2009. Expected reciprocal rank for graded relevance. In CIKM’09. ACM, New York, NY, 621--630. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Harr Chen and David R. Karger. 2006. Less is more: Probabilistic models for retrieving fewer relevant documents. In SIGIR’06. ACM, New York, NY, 429--436. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Steve Chien and Nicole Immorlica. 2005. Semantic similarity between search engine queries using temporal correlation. In WWW’05. ACM, New York, NY, 2--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Aleksandr Chuklin, Ilya Markov, and Maarten de Rijke. 2015. Click Models for Web Search. Morgan & Claypool Publishers, San Francisco, CA.Google ScholarGoogle Scholar
  19. Aleksandr Chuklin, Pavel Serdyukov, and Maarten de Rijke. 2013. Using intent information to model user behavior in diversified search. In ECIR’13. Springer, Berlin, 1--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Charles L. A. Clarke, Maheedhar Kolla, Gordon V. Cormack, Olga Vechtomova, Azin Ashkan, Stefan Büttcher, and Ian MacKinnon. 2008. Novelty and diversity in information retrieval evaluation. In SIGIR’08. ACM, New York, NY, 659--666. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Charles L. Clarke, Maheedhar Kolla, and Olga Vechtomova. 2009. An effectiveness measure for ambiguous and underspecified queries. In ICTIR’09. Springer, Berlin, 188--199. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Kevyn Collins-Thompson, Paul Bennett, Charles L. A. Clarke, and Ellen M. Voorhees. 2013. TREC 2013 Web Track overview. In TREC’13. Springer, Berlin, 1--15.Google ScholarGoogle Scholar
  23. Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms. MIT Press, Cambridge, MA. Google ScholarGoogle Scholar
  24. Nick Craswell, Rosie Jones, Georges Dupret, and Evelyne Viegas (Eds.). 2009. WSCD’09: Proceedings 2009 Workshop on Web Search Click Data. ACM. Google ScholarGoogle ScholarCross RefCross Ref
  25. Van Dang and Bruce W. Croft. 2013. Term level search result diversification. In SIGIR’13. ACM, New York, NY, 603--612. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Van Dang and W. Bruce Croft. 2012. Diversity by proportionality: An election-based approach to search result diversification. In SIGIR’12. ACM, New York, NY, 65--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Arthur P. Dempster, Nan M. Laird, and Donald B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B 39, 1, 1--38.Google ScholarGoogle ScholarCross RefCross Ref
  28. João Gama, Indrė Žliobaitė, Albert Bifet, Mykola Pechenizkiy, and Abdelhamid Bouchachia. 2014. A survey on concept drift adaptation. Computing Surveys 46, 4, 44:1--44:37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Jiafeng Guo, Xueqi Cheng, Gu Xu, and Xiaofei Zhu. 2011. Intent-aware query similarity. In CIKM’11. ACM, New York, NY, 259--268. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Jiyin He, Edgar Meij, and Maarten de Rijke. 2011. Result diversification based on query-specific cluster ranking. Journal of the Association for Information Science and Technology 62, 3, 550--571. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Katja Hofmann, Bhaskar Mitra, Filip Radlinski, and Milad Shokouhi. 2014. An eye-tracking study of user interactions with query auto completion. In CIKM’14. ACM, New York, NY, 549--558. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems 20, 4, 422--446. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Di Jiang, Kenneth Wai-Ting Leung, Jan Vosecky, and Wilfred Ng. 2009. Web query recommendation via sequential query prediction. In ICDE’09. IEEE Computer Society, Washington, DC, 1443--1454. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Jyun-Yu Jiang, Yen-Yu Ke, Pao-Yu Chien, and Pu-Jen Cheng. 2014. Learning user reformulation behavior for query auto-completion. In SIGIR’14. ACM, New York, NY, USA, 445--454. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Thorsten Joachims. 2002. Optimizing search engines using clickthrough data. In KDD’02. ACM, New York, NY, 133--142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Eugene Kharitonov, Craig Macdonald, Pavel Serdyukov, and Iadh Ounis. 2013. Intent models for contextualising and diversifying query suggestions. In CIKM’13. ACM, New York, NY, 2303--2308. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Youngho Kim and W. Bruce Croft. 2014. Diversifying query suggestions based on query documents. In SIGIR’14. ACM, New York, NY, 891--894. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Liangda Li, Hongbo Deng, Anlei Dong, Yi Chang, Hongyuan Zha, and Ricardo Baeza-Yates. 2015. Analyzing user’s sequential behavior in query auto-completion via Markov processes. In SIGIR’15. ACM, New York, NY, 123--132. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Ruirui Li, Ben Kao, Bin Bi, Reynold Cheng, and Eric Lo. 2012. DQR: A probabilistic approach to diversified query recommendation. In CIKM’12. ACM, New York, NY, 16--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Yanen Li, Anlei Dong, Hongning Wang, Hongbo Deng, Yi Chang, and ChengXiang Zhai. 2014. A two-dimensional click model for query auto-completion. In SIGIR’14. ACM, New York, NY, 455--464. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Shangsong Liang, Zhaochun Ren, and Maarten de Rijke. 2014a. Fusion helps diversification. In SIGIR’14. ACM, New York, NY, 303--312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Shangsong Liang, Zhaochun Ren, and Maarten de Rijke. 2014b. Personalized search result diversification via structured learning. In KDD’14. ACM, New York, NY, 751--760. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Zhen Liao, Daxin Jiang, Enhong Chen, Jian Pei, Huanhuan Cao, and Hang Li. 2011. Mining concept sequences from large-scale search logs for context-aware query suggestion. ACM Transactions on Intelligent Systems and Technology 3, 1, Article 17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Hao Ma, Michael R. Lyu, and Irwin King. 2010. Diversifying query suggestion results. In AAAI’10. AAAI Press, New York, NY, 1399--1404. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Tomas Mikolov, Kai Chen, Greg S. Corrado, and Jeffrey Dean. 2013a. Efficient estimation of word representations in vector space. In Proceedings of Workshop at ICLR. MIT Press, Cambridge, MA, 1--13.Google ScholarGoogle Scholar
  46. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeffrey Dean. 2013b. Distributed representations of words and phrases and their compositionality. In NIPS 26. MIT Press, Cambridge, MA, 3111--3119.Google ScholarGoogle Scholar
  47. Bhaskar Mitra. 2015. Exploring session context using distributed representations of queries and reformulations. In SIGIR’15. ACM, New York, NY, USA, 3--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Bhaskar Mitra, Milad Shokouhi, Filip Radlinski, and Katja Hofmann. 2014. On user interactions with query auto-completion. In SIGIR’14. ACM, New York, NY, 1055--1058. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Radford M. Neal. 1993. Probabilistic Inference Using Markov Chain Monte Carlo Methods. Technical Report CRG-TR-93-1. Department of Computer Science, University of Toronto, Toronto, ON.Google ScholarGoogle Scholar
  50. Greg Pass, Abdur Chowdhury, and Cayley Torgeson. 2006. A picture of search. In InfoScale’06. ACM, New York, NY, 1--7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Filip Radlinski and Susan Dumais. 2006. Improving personalized web search using result diversification. In SIGIR’06. ACM, New York, NY, 691--692. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Ruslan Salakhutdinov and Andriy Mnih. 2008a. Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. In ICML’08. ACM, New York, NY, 880--887. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Ruslan Salakhutdinov and Andriy Mnih. 2008b. Probabilistic matrix factorization. In NIPS 20. MIT Press, Cambridge, MA, 1--8.Google ScholarGoogle Scholar
  54. Rodrygo L. T. Santos, Craig Macdonald, and Iadh Ounis. 2010a. Exploiting query reformulations for web search result diversification. In WWW’10. ACM, New York, NY, 881--890. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Rodrygo L. T. Santos, Craig Macdonald, and Iadh Ounis. 2010b. Selectively diversifying web search results. In CIKM’10. ACM, New York, NY, 1179--1188. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Rodrygo L. T. Santos, Craig Macdonald, and Iadh Ounis. 2011. Intent-aware search result diversification. In SIGIR’11. ACM, New York, NY, 595--604. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Rodrygo L. Santos, Craig Macdonald, and Iadh Ounis. 2013. Learning to rank query suggestions for adhoc and diversity search. Information Retrieval 16, 4, 429--451. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Milad Shokouhi. 2011. Detecting seasonal queries by time-series analysis. In SIGIR’11. ACM, New York, NY, 1171--1172. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Milad Shokouhi. 2013. Learning to personalize query auto-completion. In SIGIR’13. ACM, New York, NY, 103--112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Milad Shokouhi and Kira Radinsky. 2012. Time-sensitive query auto-completion. In SIGIR’12. ACM, New York, NY, 601--610. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Yang Song, Dengyong Zhou, and Li-wei He. 2011. Post-ranking query suggestion by diversifying search results. In SIGIR’11. ACM, New York, NY, 815--824. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Paul Thomas and David Hawking. 2006. Evaluation by comparing result sets in context. In CIKM’06. ACM, New York, NY, 94--101. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. David Vallet. 2011. Crowdsourced evaluation of personalization and diversification techniques in web search. In CIR’11 Workshop (SIGIR’11). ACM, New York, NY, 1--6.Google ScholarGoogle Scholar
  64. David Vallet and Pablo Castells. 2011. On diversifying and personalizing web search. In SIGIR’11. ACM, New York, NY, 1157--1158. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. David Vallet and Pablo Castells. 2012. Personalized diversification of search results. In SIGIR’12. ACM, New York, NY, 841--850. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Saúl Vargas, Pablo Castells, and David Vallet. 2012. Explicit relevance models in intent-oriented information retrieval diversification. In SIGIR’12. ACM, New York, NY, 75--84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Stewart Whiting and Joemon M. Jose. 2014. Recent and robust query auto-completion. In WWW’14. ACM, New York, NY, 971--982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Cheng Xiang Zhai, William W. Cohen, and John Lafferty. 2003. Beyond independent relevance: Methods and evaluation metrics for subtopic retrieval. In SIGIR’03. ACM, New York, NY, 10--17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Aston Zhang, Amit Goyal, Weize Kong, Hongbo Deng, Anlei Dong, Yi Chang, Carl A. Gunter, and Jiawei Han. 2015. adaQAC: Adaptive query auto-completion via implicit negative feedback. In SIGIR’15. ACM, New York, NY, 143--152. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Diversifying Query Auto-Completion

      Recommendations

      Reviews

      Epaminondas Kapetanios

      In an era when many aspects of artificial intelligence (AI) and machine learning (ML) are increasing in popularity because of the proliferation of technology via a range of representational state transfer (REST) application programming interfaces (APIs) and frameworks-such as those intended for natural language processing (NLP), including text-to-text translation services, conversational user interfaces, and intelligent search engines-smart technology is becoming increasingly accessible for developers and businesses to create their own services. There is an ever-increasing interest in smarter, more assistive search engines and applications-think of question-answering systems rather than mere keyword-based search. Understanding the intention of a query has always been one of the key intelligent applications necessary to improve the user experience with a computer in matters of getting answers rather than documents from a querying session. Since query autocompletion was invented, it has been practiced in a plethora of currently available search engines. This paper provides an excellent overview of query autocompletion technology; however, it seeks to improve the performance of query autocompletion technologies by reducing the number of suggestions for completions, some of which are notoriously redundant. Removing the redundant candidates for query completion also increases space for the inclusion of more relevant suggestions; hence, it increases the probability of formulating a more semantically precise query. Another advantage of the proposed technique is that it works as a cold-start approach, that is, when no training data are available. The proposed technique and algorithms have been evaluated on the basis of mean reciprocal ranking (MRR), a well-established evaluation technique for web search. In that context, the paper is of a rather narrow scope and, therefore, of limited interest when it comes to developing search technologies, engines, and systems. Even though the development of information retrieval systems and search engines has a much broader scope in that there is more than one factor affecting their performance, the paper may only be interesting for researchers in the query autocompletion field. Nonetheless, the paper is very well written and relies on an experimental setup, which can be recommended as guidance for aspiring PhD students, despite questions about whether the results and experiments can be reproduced, an issue recently raised as a concern by the scientific community with regard to many journal publications. Online Computing Reviews Service

      Access critical reviews of Computing literature here

      Become a reviewer for Computing Reviews.

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Information Systems
        ACM Transactions on Information Systems  Volume 34, Issue 4
        September 2016
        217 pages
        ISSN:1046-8188
        EISSN:1558-2868
        DOI:10.1145/2954381
        Issue’s Table of Contents

        Copyright © 2016 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 9 June 2016
        • Revised: 1 March 2016
        • Accepted: 1 March 2016
        • Received: 1 June 2015
        Published in tois Volume 34, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader