ABSTRACT
Click logs present a wealth of evidence about how users interact with a search system. This evidence has been used for many things: learning rankings, personalizing, evaluating effectiveness, and more. But it is almost always distilled into point estimates of feature or parameter values, ignoring what may be the most salient feature of users---their variability. No two users interact with a system in exactly the same way, and even a single user may interact with results for the same query differently depending on information need, mood, time of day, and a host of other factors. We present a Bayesian approach to using logs to compute posterior distributions for probabilistic models of user interactions. Since they are distributions rather than point estimates, they naturally capture variability in the population. We show how to cluster posterior distributions to discover patterns of user interactions in logs, and discuss how to use the clusters to evaluate search engines according to a user model. Because the approach is Bayesian, our methods can be applied to very large logs (such as those possessed by Web search engines) as well as very small (such as those found in almost any other setting).
- Doug Beeferman and Adam Berger. Agglomerative clustering of a search engine query log. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '00, pages 407--416, 2000. Google ScholarDigital Library
- Ben Carterette, Evangelos Kanoulas, and Emine Yilmaz. Simulating simple user behavior for system effectiveness evaluation. In Proceedings of the 20th ACM international conference on Information and knowledge management, CIKM '11, pages 611--620, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- Olivier Chapelle, Donald Metzler, Ya Zhang, and Pierre Grinspan. Expceted reciprocal rank for graded relevance. In Proceedings of CIKM, 2009. Google ScholarDigital Library
- William S. Cooper. On selecting a measure of retrieval effectiveness. Part I. Readings in Information Retrieval, pages 191--204, 1997. Google ScholarDigital Library
- Nick Craswell, Onno Zoeter, Michael Taylor, and Bill Ramsey. An experimental comparison of click position-bias models. In Proceedings of the international conference on Web search and web data mining, WSDM '08, pages 87--94, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- Natasa Kejzar, Simona Korenjak-Cerne, and Vladimir Batagelj. Clustering of distributions: A case of patent citations. Journal of Classification, 28(2):156--183, 2011. Google ScholarDigital Library
- Ron Kohavi, Thomas Crook, and Roger Longbotham. Online experimentation at microsoft, 2009.Google Scholar
- Lyes Limam, David Coquil, Harald Kosch, and Lionel Brunie. Extracting user interests from search query logs: A clustering approach. In Proceedings of the 2010 Workshops on Database and Expert Systems Applications, DEXA '10, pages 5--9, 2010. Google ScholarDigital Library
- Yury Logachev, Lidia Grauer, and Pavel Serdyukov. Tuning parameters of the expected reciprocal rank. In Alain Mille, Fabien L. Gandon, Jacques Misselis, Michael Rabinovich, and Steffen Staab, editors, WWW, pages 571--572. ACM, 2012. Google ScholarDigital Library
- Alistair Moffat and Justin Zobel. Rank-biased precision for measurement of retrieval effectiveness. ACM Trans. Inf. Syst., 27(1):1--27, 2008. Google ScholarDigital Library
- Filip Radlinski, Madhu Kurup, and Thorsten Joachims. How does clickthrough data reflect retrieval quality? In Proceedings of the 17th ACM conference on Information and knowledge management, CIKM '08, pages 43--52, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- Stephen Robertson. A new interpretation of average precision. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '08, pages 689--690, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- Eldar Sadikov, Jayant Madhavan, Lu Wang, and Alon Halevy. Clustering query refinements by user intent. In WWW 2010, pages 841--850. Stanford InfoLab, April 2010. Google ScholarDigital Library
- Ji-Rong Wen, Jian-Yun Nie, and Hong-Jiang Zhang. Query clustering using user logs. ACM Trans. Inf. Syst., 20:59--81, January 2002. Google ScholarDigital Library
- Emine Yilmaz, Milad Shokouhi, Nick Craswell, and Stephen Robertson. Expected browsing utility for web search evaluation. In Proceedings of CIKM, pages 1561--1564, 2010. Google ScholarDigital Library
- Yuye Zhang, Laurence A. Park, and Alistair Moffat. Click-based evidence for decaying weight distributions in search effectiveness metrics. Inf. Retr., 13:46--69, Feb 2010. Google ScholarDigital Library
- Yuye Zhang, Laurence A. F. Park, and Alistair Moffat. Parameter sensitivity in rank-biased precision. In Proceedings of ADCS, 2008.Google Scholar
Index Terms
- Incorporating variability in user behavior into systems based evaluation
Recommendations
Summary of the NTCIR-10 INTENT-2 task: subtopic mining and search result diversification
SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrievalThe NTCIR INTENT task comprises two subtasks: {\em Subtopic Mining}, where systems are required to return a ranked list of {\em subtopic strings} for each given query; and {\em Document Ranking}, where systems are required to return a diversified web ...
The impact of intent selection on diversified search evaluation
SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrievalTo construct a diversified search test collection, a set of possible subtopics (or intents) needs to be determined for each topic, in one way or another, and perintent relevance assessments need to be obtained. In the TREC Web Track Diversity Task, ...
An Empirical Evaluation on Semantic Search Performance of Keyword-Based and Semantic Search Engines: Google, Yahoo, Msn and Hakia
ICIMP '09: Proceedings of the 2009 Fourth International Conference on Internet Monitoring and ProtectionThis paper investigates the semantic search performance of search engines. Initially, three keyword-based search engines (Google, Yahoo and Msn) and a semantic search engine (Hakia) were selected. Then, ten queries, from various topics, and four phrases,...
Comments