ABSTRACT
Search engine click logs provide an invaluable source of relevance information, but this information is biased. A key source of bias is presentation order: the probability of click is influenced by a document's position in the results page. This paper focuses on explaining that bias, modelling how probability of click depends on position. We propose four simple hypotheses about how position bias might arise. We carry out a large data-gathering effort, where we perturb the ranking of a major search engine, to see how clicks are affected. We then explore which of the four hypotheses best explains the real-world position effects, and compare these to a simple logistic regression model. The data are not well explained by simple position models, where some users click indiscriminately on rank 1 or there is a simple decay of attention over ranks. A 'cascade' model, where users view results from top to bottom and leave as soon as they see a worthwhile document, is our best explanation for position bias in early ranks
- Eugene Agichtein, Eric Brill, Susan Dumais, and Robert Ragno. Learning user interaction models for predicting web search result preferences. In SIGIR'06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval pages 3--10, New York, NY, USA, 2006. ACM Press. Google ScholarDigital Library
- Ricardo Baeza-Yates, Carlos Hurtado, and Marcelo Mendoza. Improving search engines by query clustering. In JASIST to appear 2007. Google ScholarDigital Library
- Georges Dupret, Vanessa Murdock, and Benjamin Piwowarski. Web search engine evaluation using click-through data and a user model. In Proceedings of the Workshop on Query Log Analysis (WWW)2007.Google Scholar
- Georges Dupret, Benjamin Piwowarski, Carlos A. Hurtado, and Marcelo Mendoza. A statistical model of query log generation. In String Processing and Information Retrieval, 13th International Conference, SPIRE 2006 pages 217--228, 2006. Google ScholarDigital Library
- Thorsten Joachims. Optimizing search engines using clickthrough data. In KDD'02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining pages 133--142, New York, NY, USA, 2002. ACM Press. Google ScholarDigital Library
- Thorsten Joachims, Laura Granka, Bing Pan, Helene Hembrooke, and Geri Gay. Accurately interpreting clickthrough data as implicit feedback. In SIGIR'05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval pages 154--161, New York, NY, USA, 2005. ACM Press. Google ScholarDigital Library
- Sandeep Pandey, Sourashis Roy, Christopher Olston, Junghoo Cho, and Soumen Chakrabarti. Shuffling a stacked deck: the case for partially randomized ranking of search engine results. In VLDB'05: Proceedings of the 31st international conference on Very large data bases pages 781--792. VLDB Endowment, 2005. Google ScholarDigital Library
- F. Radlinski and T. Joachims. Minimally invasive randomization for collecting unbiased preferences from clickthrough logs. In Conference of the Association for the Advancement of Artificial Intelligence (AAAI) pages 1406--1412, 2006. Google ScholarDigital Library
- Matthew Richardson, Ewa Dominowska, and Robert Ragno. Predicting clicks: estimating the click-through rate for new ads. In WWW'07: Proceedings of the 16th international conference on World Wide Web pages 521--530, New York, NY, USA, 2007. ACM Press. Google ScholarDigital Library
Index Terms
- An experimental comparison of click position-bias models
Recommendations
Random walks on the click graph
SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrievalSearch engines can record which documents were clicked for which query, and use these query-document pairs as "soft" relevance judgments. However, compared to the true judgments, click logs give noisy and sparse relevance information. We apply a Markov ...
Characterizing search intent diversity into click models
WWW '11: Proceedings of the 20th international conference on World wide webModeling a user's click-through behavior in click logs is a challenging task due to the well-known position bias problem. Recent advances in click models have adopted the examination hypothesis which distinguishes document relevance from position bias. ...
A collaborative filtering approach to ad recommendation using the query-ad click graph
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge managementSearch engine logs contain a large amount of click-through data that can be leveraged as soft indicators of relevance. In this paper we address the sponsored search retrieval problem which is to find and rank relevant ads to a search query. We propose a ...
Comments