research-article

Diversifying Query Auto-Completion

Authors:
Fei Cai

National University of Defense Technology and University of Amsterdam, The Netherlands

National University of Defense Technology and University of Amsterdam, The Netherlands
View Profile

,
Ridho Reinanda

University of Amsterdam, Amsterdam, The Netherlands

University of Amsterdam, Amsterdam, The Netherlands
View Profile

,
Maarten De Rijke

University of Amsterdam, Amsterdam, The Netherlands

University of Amsterdam, Amsterdam, The Netherlands
View Profile

Authors Info & Claims

ACM Transactions on Information Systems Volume 34 Issue 4Article No.: 25pp 1–33https://doi.org/10.1145/2910579

Published:09 June 2016Publication History

ACM Transactions on Information Systems

Abstract

Query auto-completion assists web search users in formulating queries with a few keystrokes, helping them to avoid spelling mistakes and to produce clear query expressions, and so on. Previous work on query auto-completion mainly centers around returning a list of completions to users, aiming to push queries that are most likely intended by the user to the top positions but ignoring the redundancy among the query candidates in the list. Thus, semantically related queries matching the input prefix are often returned together. This may push valuable suggestions out of the list, given that only a limited number of candidates can be shown to the user, which may result in a less than optimal search experience.

In this article, we consider the task of diversifying query auto-completion, which aims to return the correct query completions early in a ranked list of candidate completions and at the same time reduce the redundancy among query auto-completion candidates. We develop a greedy query selection approach that predicts query completions based on the current search popularity of candidate completions and on the aspects of previous queries in the same search session. The popularity of completion candidates at query time can be directly aggregated from query logs. However, query aspects are implicitly expressed by previous clicked documents in the search context. To determine the query aspect, we categorize clicked documents of a query using a hierarchy based on the open directory project. Bayesian probabilistic matrix factorization is applied to derive the distribution of queries over all aspects. We quantify the improvement of our greedy query selection model against a state-of-the-art baseline using two large-scale, real-world query logs and show that it beats the baseline in terms of well-known metrics used in query auto-completion and diversification. In addition, we conduct a side-by-side experiment to verify the effectiveness of our proposal.

References

Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, and Samuel Ieong. 2009. Diversifying search results. In WSDM’09. ACM, New York, NY, 5--14. Google ScholarDigital Library
Kevin Bache, David Newman, and Padhraic Smyth. 2013. Text-based measures of document diversity. In KDD’13. ACM, New York, NY, 23--31. Google ScholarDigital Library
Ziv Bar-Yossef and Naama Kraus. 2011. Context-sensitive query auto-completion. In WWW’11. ACM, New York, NY, 107--116. Google ScholarDigital Library
Paul N. Bennett, Ryen W. White, Wei Chu, Susan T. Dumais, Peter Bailey, Fedor Borisyuk, and Xiaoyuan Cui. 2012. Modeling the impact of short- and long-term behavior on search personalization. In SIGIR’12. ACM, New York, NY, 185--194. Google ScholarDigital Library
Danushka Bollegala, Yutaka Matsuo, and Mitsuru Ishizuka. 2007. Measuring semantic similarity between words using web search engines. In WWW’07. ACM, New York, NY, 757--766. Google ScholarDigital Library
Fei Cai and Maarten de Rijke. 2016a. Learning from homologous queries and semantically related terms for query auto completion. Information Processing and Management 52, 4, 628--643. Google ScholarDigital Library
Fei Cai and Maarten de Rijke. 2016b. Query auto completion in information retrieval. Foundations and Trends in Information Retrieval Submitted.Google Scholar
Fei Cai and Maarten de Rijke. 2016c. Selectively personalizing query auto-completion. In SIGIR’16. ACM, New York, NY, To appear.Google Scholar
Fei Cai, Shangsong Liang, and Maarten de Rijke. 2014a. Personalized document re-ranking based on Bayesian probabilistic matrix factorization. In SIGIR’14. ACM, New York, NY, 835--838. Google ScholarDigital Library
Fei Cai, Shangsong Liang, and Maarten de Rijke. 2014b. Time-sensitive personalized query auto-completion. In CIKM’14. ACM, New York, NY, 1599--1608. Google ScholarDigital Library
Fei Cai, Shangsong Liang, and Maarten de Rijke. 2016a. Prefix-adaptive and time-sensitive personalized query auto completion. IEEE Transactions on Knowledge and Data Engineering To appear.Google ScholarDigital Library
Fei Cai, Shuaiqiang Wang, and Maarten de Rijke. 2016b. Behavior-based personalization in web search. Journal of the Association for Information Science and Technology To appear.Google Scholar
Jaime Carbonell and Jade Goldstein. 1998. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In SIGIR’98. ACM, New York, NY, 335--336. Google ScholarDigital Library
Olivier Chapelle, Thorsten Joachims, Filip Radlinski, and Yisong Yue. 2012. Large-scale validation and analysis of interleaved search evaluation. ACM Transactions on Information Systems 30, 1, 6:1--6:41. Google ScholarDigital Library
Olivier Chapelle, Donald Metzler, Ya Zhang, and Pierre Grinspan. 2009. Expected reciprocal rank for graded relevance. In CIKM’09. ACM, New York, NY, 621--630. Google ScholarDigital Library
Harr Chen and David R. Karger. 2006. Less is more: Probabilistic models for retrieving fewer relevant documents. In SIGIR’06. ACM, New York, NY, 429--436. Google ScholarDigital Library
Steve Chien and Nicole Immorlica. 2005. Semantic similarity between search engine queries using temporal correlation. In WWW’05. ACM, New York, NY, 2--11. Google ScholarDigital Library
Aleksandr Chuklin, Ilya Markov, and Maarten de Rijke. 2015. Click Models for Web Search. Morgan & Claypool Publishers, San Francisco, CA.Google Scholar
Aleksandr Chuklin, Pavel Serdyukov, and Maarten de Rijke. 2013. Using intent information to model user behavior in diversified search. In ECIR’13. Springer, Berlin, 1--13. Google ScholarDigital Library
Charles L. A. Clarke, Maheedhar Kolla, Gordon V. Cormack, Olga Vechtomova, Azin Ashkan, Stefan Büttcher, and Ian MacKinnon. 2008. Novelty and diversity in information retrieval evaluation. In SIGIR’08. ACM, New York, NY, 659--666. Google ScholarDigital Library
Charles L. Clarke, Maheedhar Kolla, and Olga Vechtomova. 2009. An effectiveness measure for ambiguous and underspecified queries. In ICTIR’09. Springer, Berlin, 188--199. Google ScholarDigital Library
Kevyn Collins-Thompson, Paul Bennett, Charles L. A. Clarke, and Ellen M. Voorhees. 2013. TREC 2013 Web Track overview. In TREC’13. Springer, Berlin, 1--15.Google Scholar
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms. MIT Press, Cambridge, MA. Google Scholar
Nick Craswell, Rosie Jones, Georges Dupret, and Evelyne Viegas (Eds.). 2009. WSCD’09: Proceedings 2009 Workshop on Web Search Click Data. ACM. Google ScholarCross Ref
Van Dang and Bruce W. Croft. 2013. Term level search result diversification. In SIGIR’13. ACM, New York, NY, 603--612. Google ScholarDigital Library
Van Dang and W. Bruce Croft. 2012. Diversity by proportionality: An election-based approach to search result diversification. In SIGIR’12. ACM, New York, NY, 65--74. Google ScholarDigital Library
Arthur P. Dempster, Nan M. Laird, and Donald B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B 39, 1, 1--38.Google ScholarCross Ref
João Gama, Indrė Žliobaitė, Albert Bifet, Mykola Pechenizkiy, and Abdelhamid Bouchachia. 2014. A survey on concept drift adaptation. Computing Surveys 46, 4, 44:1--44:37. Google ScholarDigital Library
Jiafeng Guo, Xueqi Cheng, Gu Xu, and Xiaofei Zhu. 2011. Intent-aware query similarity. In CIKM’11. ACM, New York, NY, 259--268. Google ScholarDigital Library
Jiyin He, Edgar Meij, and Maarten de Rijke. 2011. Result diversification based on query-specific cluster ranking. Journal of the Association for Information Science and Technology 62, 3, 550--571. Google ScholarDigital Library
Katja Hofmann, Bhaskar Mitra, Filip Radlinski, and Milad Shokouhi. 2014. An eye-tracking study of user interactions with query auto completion. In CIKM’14. ACM, New York, NY, 549--558. Google ScholarDigital Library
Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems 20, 4, 422--446. Google ScholarDigital Library
Di Jiang, Kenneth Wai-Ting Leung, Jan Vosecky, and Wilfred Ng. 2009. Web query recommendation via sequential query prediction. In ICDE’09. IEEE Computer Society, Washington, DC, 1443--1454. Google ScholarDigital Library
Jyun-Yu Jiang, Yen-Yu Ke, Pao-Yu Chien, and Pu-Jen Cheng. 2014. Learning user reformulation behavior for query auto-completion. In SIGIR’14. ACM, New York, NY, USA, 445--454. Google ScholarDigital Library
Thorsten Joachims. 2002. Optimizing search engines using clickthrough data. In KDD’02. ACM, New York, NY, 133--142. Google ScholarDigital Library
Eugene Kharitonov, Craig Macdonald, Pavel Serdyukov, and Iadh Ounis. 2013. Intent models for contextualising and diversifying query suggestions. In CIKM’13. ACM, New York, NY, 2303--2308. Google ScholarDigital Library
Youngho Kim and W. Bruce Croft. 2014. Diversifying query suggestions based on query documents. In SIGIR’14. ACM, New York, NY, 891--894. Google ScholarDigital Library
Liangda Li, Hongbo Deng, Anlei Dong, Yi Chang, Hongyuan Zha, and Ricardo Baeza-Yates. 2015. Analyzing user’s sequential behavior in query auto-completion via Markov processes. In SIGIR’15. ACM, New York, NY, 123--132. Google ScholarDigital Library
Ruirui Li, Ben Kao, Bin Bi, Reynold Cheng, and Eric Lo. 2012. DQR: A probabilistic approach to diversified query recommendation. In CIKM’12. ACM, New York, NY, 16--25. Google ScholarDigital Library
Yanen Li, Anlei Dong, Hongning Wang, Hongbo Deng, Yi Chang, and ChengXiang Zhai. 2014. A two-dimensional click model for query auto-completion. In SIGIR’14. ACM, New York, NY, 455--464. Google ScholarDigital Library
Shangsong Liang, Zhaochun Ren, and Maarten de Rijke. 2014a. Fusion helps diversification. In SIGIR’14. ACM, New York, NY, 303--312. Google ScholarDigital Library
Shangsong Liang, Zhaochun Ren, and Maarten de Rijke. 2014b. Personalized search result diversification via structured learning. In KDD’14. ACM, New York, NY, 751--760. Google ScholarDigital Library
Zhen Liao, Daxin Jiang, Enhong Chen, Jian Pei, Huanhuan Cao, and Hang Li. 2011. Mining concept sequences from large-scale search logs for context-aware query suggestion. ACM Transactions on Intelligent Systems and Technology 3, 1, Article 17. Google ScholarDigital Library
Hao Ma, Michael R. Lyu, and Irwin King. 2010. Diversifying query suggestion results. In AAAI’10. AAAI Press, New York, NY, 1399--1404. Google ScholarDigital Library
Tomas Mikolov, Kai Chen, Greg S. Corrado, and Jeffrey Dean. 2013a. Efficient estimation of word representations in vector space. In Proceedings of Workshop at ICLR. MIT Press, Cambridge, MA, 1--13.Google Scholar
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeffrey Dean. 2013b. Distributed representations of words and phrases and their compositionality. In NIPS 26. MIT Press, Cambridge, MA, 3111--3119.Google Scholar
Bhaskar Mitra. 2015. Exploring session context using distributed representations of queries and reformulations. In SIGIR’15. ACM, New York, NY, USA, 3--12. Google ScholarDigital Library
Bhaskar Mitra, Milad Shokouhi, Filip Radlinski, and Katja Hofmann. 2014. On user interactions with query auto-completion. In SIGIR’14. ACM, New York, NY, 1055--1058. Google ScholarDigital Library
Radford M. Neal. 1993. Probabilistic Inference Using Markov Chain Monte Carlo Methods. Technical Report CRG-TR-93-1. Department of Computer Science, University of Toronto, Toronto, ON.Google Scholar
Greg Pass, Abdur Chowdhury, and Cayley Torgeson. 2006. A picture of search. In InfoScale’06. ACM, New York, NY, 1--7. Google ScholarDigital Library
Filip Radlinski and Susan Dumais. 2006. Improving personalized web search using result diversification. In SIGIR’06. ACM, New York, NY, 691--692. Google ScholarDigital Library
Ruslan Salakhutdinov and Andriy Mnih. 2008a. Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. In ICML’08. ACM, New York, NY, 880--887. Google ScholarDigital Library
Ruslan Salakhutdinov and Andriy Mnih. 2008b. Probabilistic matrix factorization. In NIPS 20. MIT Press, Cambridge, MA, 1--8.Google Scholar
Rodrygo L. T. Santos, Craig Macdonald, and Iadh Ounis. 2010a. Exploiting query reformulations for web search result diversification. In WWW’10. ACM, New York, NY, 881--890. Google ScholarDigital Library
Rodrygo L. T. Santos, Craig Macdonald, and Iadh Ounis. 2010b. Selectively diversifying web search results. In CIKM’10. ACM, New York, NY, 1179--1188. Google ScholarDigital Library
Rodrygo L. T. Santos, Craig Macdonald, and Iadh Ounis. 2011. Intent-aware search result diversification. In SIGIR’11. ACM, New York, NY, 595--604. Google ScholarDigital Library
Rodrygo L. Santos, Craig Macdonald, and Iadh Ounis. 2013. Learning to rank query suggestions for adhoc and diversity search. Information Retrieval 16, 4, 429--451. Google ScholarDigital Library
Milad Shokouhi. 2011. Detecting seasonal queries by time-series analysis. In SIGIR’11. ACM, New York, NY, 1171--1172. Google ScholarDigital Library
Milad Shokouhi. 2013. Learning to personalize query auto-completion. In SIGIR’13. ACM, New York, NY, 103--112. Google ScholarDigital Library
Milad Shokouhi and Kira Radinsky. 2012. Time-sensitive query auto-completion. In SIGIR’12. ACM, New York, NY, 601--610. Google ScholarDigital Library
Yang Song, Dengyong Zhou, and Li-wei He. 2011. Post-ranking query suggestion by diversifying search results. In SIGIR’11. ACM, New York, NY, 815--824. Google ScholarDigital Library
Paul Thomas and David Hawking. 2006. Evaluation by comparing result sets in context. In CIKM’06. ACM, New York, NY, 94--101. Google ScholarDigital Library
David Vallet. 2011. Crowdsourced evaluation of personalization and diversification techniques in web search. In CIR’11 Workshop (SIGIR’11). ACM, New York, NY, 1--6.Google Scholar
David Vallet and Pablo Castells. 2011. On diversifying and personalizing web search. In SIGIR’11. ACM, New York, NY, 1157--1158. Google ScholarDigital Library
David Vallet and Pablo Castells. 2012. Personalized diversification of search results. In SIGIR’12. ACM, New York, NY, 841--850. Google ScholarDigital Library
Saúl Vargas, Pablo Castells, and David Vallet. 2012. Explicit relevance models in intent-oriented information retrieval diversification. In SIGIR’12. ACM, New York, NY, 75--84. Google ScholarDigital Library
Stewart Whiting and Joemon M. Jose. 2014. Recent and robust query auto-completion. In WWW’14. ACM, New York, NY, 971--982. Google ScholarDigital Library
Cheng Xiang Zhai, William W. Cohen, and John Lafferty. 2003. Beyond independent relevance: Methods and evaluation metrics for subtopic retrieval. In SIGIR’03. ACM, New York, NY, 10--17. Google ScholarDigital Library
Aston Zhang, Amit Goyal, Weize Kong, Hongbo Deng, Anlei Dong, Yi Chang, Carl A. Gunter, and Jiawei Han. 2015. adaQAC: Adaptive query auto-completion via implicit negative feedback. In SIGIR’15. ACM, New York, NY, 143--152. Google ScholarDigital Library

Index Terms

Diversifying Query Auto-Completion
1. Information systems
  1. Information retrieval
  2. Information storage systems

Recommendations

Query Auto-Completion for Rare Prefixes
CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management

Query auto-completion (QAC) systems typically suggest queries that have previously been observed in search logs. Given a partial user query, the system looks up this query prefix against a precomputed set of candidates, then orders them using ranking ...
Read More
Learning to personalize query auto-completion
SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Query auto-completion (QAC) is one of the most prominent features of modern search engines. The list of query candidates is generated according to the prefix entered by the user in the search box and is updated on each new key stroke. Query prefixes ...
Read More
Classifying User Search Intents for Query Auto-Completion
ICTIR '16: Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval

The function of query auto-completion in modern search engines is to help users formulate queries fast and precisely. Conventional context-aware methods primarily rank candidate queries according to term- and query- relationships to the context. However,...
Read More

Reviews

Reviewer: Epaminondas Kapetanios

In an era when many aspects of artificial intelligence (AI) and machine learning (ML) are increasing in popularity because of the proliferation of technology via a range of representational state transfer (REST) application programming interfaces (APIs) and frameworks-such as those intended for natural language processing (NLP), including text-to-text translation services, conversational user interfaces, and intelligent search engines-smart technology is becoming increasingly accessible for developers and businesses to create their own services. There is an ever-increasing interest in smarter, more assistive search engines and applications-think of question-answering systems rather than mere keyword-based search. Understanding the intention of a query has always been one of the key intelligent applications necessary to improve the user experience with a computer in matters of getting answers rather than documents from a querying session. Since query autocompletion was invented, it has been practiced in a plethora of currently available search engines. This paper provides an excellent overview of query autocompletion technology; however, it seeks to improve the performance of query autocompletion technologies by reducing the number of suggestions for completions, some of which are notoriously redundant. Removing the redundant candidates for query completion also increases space for the inclusion of more relevant suggestions; hence, it increases the probability of formulating a more semantically precise query. Another advantage of the proposed technique is that it works as a cold-start approach, that is, when no training data are available. The proposed technique and algorithms have been evaluated on the basis of mean reciprocal ranking (MRR), a well-established evaluation technique for web search. In that context, the paper is of a rather narrow scope and, therefore, of limited interest when it comes to developing search technologies, engines, and systems. Even though the development of information retrieval systems and search engines has a much broader scope in that there is more than one factor affecting their performance, the paper may only be interesting for researchers in the query autocompletion field. Nonetheless, the paper is very well written and relies on an experimental setup, which can be recommended as guidance for aspiring PhD students, despite questions about whether the results and experiments can be reproduced, an issue recently raised as a concern by the scientific community with regard to many journal publications. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Information Systems Volume 34, Issue 4
September 2016
217 pages
ISSN:1046-8188
EISSN:1558-2868
DOI:10.1145/2954381
Editor:
Maarten de Rijke
University of Amsterdam, The Netherlands
Issue’s Table of Contents
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 June 2016
- Revised: 1 March 2016
- Accepted: 1 March 2016
- Received: 1 June 2015
Published in tois Volume 34, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Query auto-completion
diversification
query suggestion
web search
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 35
  Total Citations
  View Citations
- 530
  Total Downloads
- Downloads (Last 12 months)24
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Diversifying Query Auto-Completion

ACM Transactions on Information Systems

Abstract

References

Cited By

Index Terms

Recommendations

Query Auto-Completion for Rare Prefixes

Learning to personalize query auto-completion

Classifying User Search Intents for Query Auto-Completion

Reviews

Access critical reviews of Computing literature here