Abstract
Query auto-completion assists web search users in formulating queries with a few keystrokes, helping them to avoid spelling mistakes and to produce clear query expressions, and so on. Previous work on query auto-completion mainly centers around returning a list of completions to users, aiming to push queries that are most likely intended by the user to the top positions but ignoring the redundancy among the query candidates in the list. Thus, semantically related queries matching the input prefix are often returned together. This may push valuable suggestions out of the list, given that only a limited number of candidates can be shown to the user, which may result in a less than optimal search experience.
In this article, we consider the task of diversifying query auto-completion, which aims to return the correct query completions early in a ranked list of candidate completions and at the same time reduce the redundancy among query auto-completion candidates. We develop a greedy query selection approach that predicts query completions based on the current search popularity of candidate completions and on the aspects of previous queries in the same search session. The popularity of completion candidates at query time can be directly aggregated from query logs. However, query aspects are implicitly expressed by previous clicked documents in the search context. To determine the query aspect, we categorize clicked documents of a query using a hierarchy based on the open directory project. Bayesian probabilistic matrix factorization is applied to derive the distribution of queries over all aspects. We quantify the improvement of our greedy query selection model against a state-of-the-art baseline using two large-scale, real-world query logs and show that it beats the baseline in terms of well-known metrics used in query auto-completion and diversification. In addition, we conduct a side-by-side experiment to verify the effectiveness of our proposal.
- Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, and Samuel Ieong. 2009. Diversifying search results. In WSDM’09. ACM, New York, NY, 5--14. Google ScholarDigital Library
- Kevin Bache, David Newman, and Padhraic Smyth. 2013. Text-based measures of document diversity. In KDD’13. ACM, New York, NY, 23--31. Google ScholarDigital Library
- Ziv Bar-Yossef and Naama Kraus. 2011. Context-sensitive query auto-completion. In WWW’11. ACM, New York, NY, 107--116. Google ScholarDigital Library
- Paul N. Bennett, Ryen W. White, Wei Chu, Susan T. Dumais, Peter Bailey, Fedor Borisyuk, and Xiaoyuan Cui. 2012. Modeling the impact of short- and long-term behavior on search personalization. In SIGIR’12. ACM, New York, NY, 185--194. Google ScholarDigital Library
- Danushka Bollegala, Yutaka Matsuo, and Mitsuru Ishizuka. 2007. Measuring semantic similarity between words using web search engines. In WWW’07. ACM, New York, NY, 757--766. Google ScholarDigital Library
- Fei Cai and Maarten de Rijke. 2016a. Learning from homologous queries and semantically related terms for query auto completion. Information Processing and Management 52, 4, 628--643. Google ScholarDigital Library
- Fei Cai and Maarten de Rijke. 2016b. Query auto completion in information retrieval. Foundations and Trends in Information Retrieval Submitted.Google Scholar
- Fei Cai and Maarten de Rijke. 2016c. Selectively personalizing query auto-completion. In SIGIR’16. ACM, New York, NY, To appear.Google Scholar
- Fei Cai, Shangsong Liang, and Maarten de Rijke. 2014a. Personalized document re-ranking based on Bayesian probabilistic matrix factorization. In SIGIR’14. ACM, New York, NY, 835--838. Google ScholarDigital Library
- Fei Cai, Shangsong Liang, and Maarten de Rijke. 2014b. Time-sensitive personalized query auto-completion. In CIKM’14. ACM, New York, NY, 1599--1608. Google ScholarDigital Library
- Fei Cai, Shangsong Liang, and Maarten de Rijke. 2016a. Prefix-adaptive and time-sensitive personalized query auto completion. IEEE Transactions on Knowledge and Data Engineering To appear.Google ScholarDigital Library
- Fei Cai, Shuaiqiang Wang, and Maarten de Rijke. 2016b. Behavior-based personalization in web search. Journal of the Association for Information Science and Technology To appear.Google Scholar
- Jaime Carbonell and Jade Goldstein. 1998. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In SIGIR’98. ACM, New York, NY, 335--336. Google ScholarDigital Library
- Olivier Chapelle, Thorsten Joachims, Filip Radlinski, and Yisong Yue. 2012. Large-scale validation and analysis of interleaved search evaluation. ACM Transactions on Information Systems 30, 1, 6:1--6:41. Google ScholarDigital Library
- Olivier Chapelle, Donald Metzler, Ya Zhang, and Pierre Grinspan. 2009. Expected reciprocal rank for graded relevance. In CIKM’09. ACM, New York, NY, 621--630. Google ScholarDigital Library
- Harr Chen and David R. Karger. 2006. Less is more: Probabilistic models for retrieving fewer relevant documents. In SIGIR’06. ACM, New York, NY, 429--436. Google ScholarDigital Library
- Steve Chien and Nicole Immorlica. 2005. Semantic similarity between search engine queries using temporal correlation. In WWW’05. ACM, New York, NY, 2--11. Google ScholarDigital Library
- Aleksandr Chuklin, Ilya Markov, and Maarten de Rijke. 2015. Click Models for Web Search. Morgan & Claypool Publishers, San Francisco, CA.Google Scholar
- Aleksandr Chuklin, Pavel Serdyukov, and Maarten de Rijke. 2013. Using intent information to model user behavior in diversified search. In ECIR’13. Springer, Berlin, 1--13. Google ScholarDigital Library
- Charles L. A. Clarke, Maheedhar Kolla, Gordon V. Cormack, Olga Vechtomova, Azin Ashkan, Stefan Büttcher, and Ian MacKinnon. 2008. Novelty and diversity in information retrieval evaluation. In SIGIR’08. ACM, New York, NY, 659--666. Google ScholarDigital Library
- Charles L. Clarke, Maheedhar Kolla, and Olga Vechtomova. 2009. An effectiveness measure for ambiguous and underspecified queries. In ICTIR’09. Springer, Berlin, 188--199. Google ScholarDigital Library
- Kevyn Collins-Thompson, Paul Bennett, Charles L. A. Clarke, and Ellen M. Voorhees. 2013. TREC 2013 Web Track overview. In TREC’13. Springer, Berlin, 1--15.Google Scholar
- Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms. MIT Press, Cambridge, MA. Google Scholar
- Nick Craswell, Rosie Jones, Georges Dupret, and Evelyne Viegas (Eds.). 2009. WSCD’09: Proceedings 2009 Workshop on Web Search Click Data. ACM. Google ScholarCross Ref
- Van Dang and Bruce W. Croft. 2013. Term level search result diversification. In SIGIR’13. ACM, New York, NY, 603--612. Google ScholarDigital Library
- Van Dang and W. Bruce Croft. 2012. Diversity by proportionality: An election-based approach to search result diversification. In SIGIR’12. ACM, New York, NY, 65--74. Google ScholarDigital Library
- Arthur P. Dempster, Nan M. Laird, and Donald B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B 39, 1, 1--38.Google ScholarCross Ref
- João Gama, Indrė Žliobaitė, Albert Bifet, Mykola Pechenizkiy, and Abdelhamid Bouchachia. 2014. A survey on concept drift adaptation. Computing Surveys 46, 4, 44:1--44:37. Google ScholarDigital Library
- Jiafeng Guo, Xueqi Cheng, Gu Xu, and Xiaofei Zhu. 2011. Intent-aware query similarity. In CIKM’11. ACM, New York, NY, 259--268. Google ScholarDigital Library
- Jiyin He, Edgar Meij, and Maarten de Rijke. 2011. Result diversification based on query-specific cluster ranking. Journal of the Association for Information Science and Technology 62, 3, 550--571. Google ScholarDigital Library
- Katja Hofmann, Bhaskar Mitra, Filip Radlinski, and Milad Shokouhi. 2014. An eye-tracking study of user interactions with query auto completion. In CIKM’14. ACM, New York, NY, 549--558. Google ScholarDigital Library
- Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems 20, 4, 422--446. Google ScholarDigital Library
- Di Jiang, Kenneth Wai-Ting Leung, Jan Vosecky, and Wilfred Ng. 2009. Web query recommendation via sequential query prediction. In ICDE’09. IEEE Computer Society, Washington, DC, 1443--1454. Google ScholarDigital Library
- Jyun-Yu Jiang, Yen-Yu Ke, Pao-Yu Chien, and Pu-Jen Cheng. 2014. Learning user reformulation behavior for query auto-completion. In SIGIR’14. ACM, New York, NY, USA, 445--454. Google ScholarDigital Library
- Thorsten Joachims. 2002. Optimizing search engines using clickthrough data. In KDD’02. ACM, New York, NY, 133--142. Google ScholarDigital Library
- Eugene Kharitonov, Craig Macdonald, Pavel Serdyukov, and Iadh Ounis. 2013. Intent models for contextualising and diversifying query suggestions. In CIKM’13. ACM, New York, NY, 2303--2308. Google ScholarDigital Library
- Youngho Kim and W. Bruce Croft. 2014. Diversifying query suggestions based on query documents. In SIGIR’14. ACM, New York, NY, 891--894. Google ScholarDigital Library
- Liangda Li, Hongbo Deng, Anlei Dong, Yi Chang, Hongyuan Zha, and Ricardo Baeza-Yates. 2015. Analyzing user’s sequential behavior in query auto-completion via Markov processes. In SIGIR’15. ACM, New York, NY, 123--132. Google ScholarDigital Library
- Ruirui Li, Ben Kao, Bin Bi, Reynold Cheng, and Eric Lo. 2012. DQR: A probabilistic approach to diversified query recommendation. In CIKM’12. ACM, New York, NY, 16--25. Google ScholarDigital Library
- Yanen Li, Anlei Dong, Hongning Wang, Hongbo Deng, Yi Chang, and ChengXiang Zhai. 2014. A two-dimensional click model for query auto-completion. In SIGIR’14. ACM, New York, NY, 455--464. Google ScholarDigital Library
- Shangsong Liang, Zhaochun Ren, and Maarten de Rijke. 2014a. Fusion helps diversification. In SIGIR’14. ACM, New York, NY, 303--312. Google ScholarDigital Library
- Shangsong Liang, Zhaochun Ren, and Maarten de Rijke. 2014b. Personalized search result diversification via structured learning. In KDD’14. ACM, New York, NY, 751--760. Google ScholarDigital Library
- Zhen Liao, Daxin Jiang, Enhong Chen, Jian Pei, Huanhuan Cao, and Hang Li. 2011. Mining concept sequences from large-scale search logs for context-aware query suggestion. ACM Transactions on Intelligent Systems and Technology 3, 1, Article 17. Google ScholarDigital Library
- Hao Ma, Michael R. Lyu, and Irwin King. 2010. Diversifying query suggestion results. In AAAI’10. AAAI Press, New York, NY, 1399--1404. Google ScholarDigital Library
- Tomas Mikolov, Kai Chen, Greg S. Corrado, and Jeffrey Dean. 2013a. Efficient estimation of word representations in vector space. In Proceedings of Workshop at ICLR. MIT Press, Cambridge, MA, 1--13.Google Scholar
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeffrey Dean. 2013b. Distributed representations of words and phrases and their compositionality. In NIPS 26. MIT Press, Cambridge, MA, 3111--3119.Google Scholar
- Bhaskar Mitra. 2015. Exploring session context using distributed representations of queries and reformulations. In SIGIR’15. ACM, New York, NY, USA, 3--12. Google ScholarDigital Library
- Bhaskar Mitra, Milad Shokouhi, Filip Radlinski, and Katja Hofmann. 2014. On user interactions with query auto-completion. In SIGIR’14. ACM, New York, NY, 1055--1058. Google ScholarDigital Library
- Radford M. Neal. 1993. Probabilistic Inference Using Markov Chain Monte Carlo Methods. Technical Report CRG-TR-93-1. Department of Computer Science, University of Toronto, Toronto, ON.Google Scholar
- Greg Pass, Abdur Chowdhury, and Cayley Torgeson. 2006. A picture of search. In InfoScale’06. ACM, New York, NY, 1--7. Google ScholarDigital Library
- Filip Radlinski and Susan Dumais. 2006. Improving personalized web search using result diversification. In SIGIR’06. ACM, New York, NY, 691--692. Google ScholarDigital Library
- Ruslan Salakhutdinov and Andriy Mnih. 2008a. Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. In ICML’08. ACM, New York, NY, 880--887. Google ScholarDigital Library
- Ruslan Salakhutdinov and Andriy Mnih. 2008b. Probabilistic matrix factorization. In NIPS 20. MIT Press, Cambridge, MA, 1--8.Google Scholar
- Rodrygo L. T. Santos, Craig Macdonald, and Iadh Ounis. 2010a. Exploiting query reformulations for web search result diversification. In WWW’10. ACM, New York, NY, 881--890. Google ScholarDigital Library
- Rodrygo L. T. Santos, Craig Macdonald, and Iadh Ounis. 2010b. Selectively diversifying web search results. In CIKM’10. ACM, New York, NY, 1179--1188. Google ScholarDigital Library
- Rodrygo L. T. Santos, Craig Macdonald, and Iadh Ounis. 2011. Intent-aware search result diversification. In SIGIR’11. ACM, New York, NY, 595--604. Google ScholarDigital Library
- Rodrygo L. Santos, Craig Macdonald, and Iadh Ounis. 2013. Learning to rank query suggestions for adhoc and diversity search. Information Retrieval 16, 4, 429--451. Google ScholarDigital Library
- Milad Shokouhi. 2011. Detecting seasonal queries by time-series analysis. In SIGIR’11. ACM, New York, NY, 1171--1172. Google ScholarDigital Library
- Milad Shokouhi. 2013. Learning to personalize query auto-completion. In SIGIR’13. ACM, New York, NY, 103--112. Google ScholarDigital Library
- Milad Shokouhi and Kira Radinsky. 2012. Time-sensitive query auto-completion. In SIGIR’12. ACM, New York, NY, 601--610. Google ScholarDigital Library
- Yang Song, Dengyong Zhou, and Li-wei He. 2011. Post-ranking query suggestion by diversifying search results. In SIGIR’11. ACM, New York, NY, 815--824. Google ScholarDigital Library
- Paul Thomas and David Hawking. 2006. Evaluation by comparing result sets in context. In CIKM’06. ACM, New York, NY, 94--101. Google ScholarDigital Library
- David Vallet. 2011. Crowdsourced evaluation of personalization and diversification techniques in web search. In CIR’11 Workshop (SIGIR’11). ACM, New York, NY, 1--6.Google Scholar
- David Vallet and Pablo Castells. 2011. On diversifying and personalizing web search. In SIGIR’11. ACM, New York, NY, 1157--1158. Google ScholarDigital Library
- David Vallet and Pablo Castells. 2012. Personalized diversification of search results. In SIGIR’12. ACM, New York, NY, 841--850. Google ScholarDigital Library
- Saúl Vargas, Pablo Castells, and David Vallet. 2012. Explicit relevance models in intent-oriented information retrieval diversification. In SIGIR’12. ACM, New York, NY, 75--84. Google ScholarDigital Library
- Stewart Whiting and Joemon M. Jose. 2014. Recent and robust query auto-completion. In WWW’14. ACM, New York, NY, 971--982. Google ScholarDigital Library
- Cheng Xiang Zhai, William W. Cohen, and John Lafferty. 2003. Beyond independent relevance: Methods and evaluation metrics for subtopic retrieval. In SIGIR’03. ACM, New York, NY, 10--17. Google ScholarDigital Library
- Aston Zhang, Amit Goyal, Weize Kong, Hongbo Deng, Anlei Dong, Yi Chang, Carl A. Gunter, and Jiawei Han. 2015. adaQAC: Adaptive query auto-completion via implicit negative feedback. In SIGIR’15. ACM, New York, NY, 143--152. Google ScholarDigital Library
Index Terms
- Diversifying Query Auto-Completion
Recommendations
Query Auto-Completion for Rare Prefixes
CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge ManagementQuery auto-completion (QAC) systems typically suggest queries that have previously been observed in search logs. Given a partial user query, the system looks up this query prefix against a precomputed set of candidates, then orders them using ranking ...
Learning to personalize query auto-completion
SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrievalQuery auto-completion (QAC) is one of the most prominent features of modern search engines. The list of query candidates is generated according to the prefix entered by the user in the search box and is updated on each new key stroke. Query prefixes ...
Classifying User Search Intents for Query Auto-Completion
ICTIR '16: Proceedings of the 2016 ACM International Conference on the Theory of Information RetrievalThe function of query auto-completion in modern search engines is to help users formulate queries fast and precisely. Conventional context-aware methods primarily rank candidate queries according to term- and query- relationships to the context. However,...
Comments