research-article

Incorporating variability in user behavior into systems based evaluation

Authors:
Ben Carterette

University of Delaware, Newark, DE, USA

University of Delaware, Newark, DE, USA
View Profile

,
Evangelos Kanoulas

University of Sheffield, Sheffield, United Kingdom

University of Sheffield, Sheffield, United Kingdom
View Profile

,
Emine Yilmaz

Microsoft Research, Cambridge, United Kingdom

Microsoft Research, Cambridge, United Kingdom
View Profile

CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge managementOctober 2012Pages 135–144https://doi.org/10.1145/2396761.2396782

Published:29 October 2012Publication History

CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

Pages 135–144

ABSTRACT

Click logs present a wealth of evidence about how users interact with a search system. This evidence has been used for many things: learning rankings, personalizing, evaluating effectiveness, and more. But it is almost always distilled into point estimates of feature or parameter values, ignoring what may be the most salient feature of users---their variability. No two users interact with a system in exactly the same way, and even a single user may interact with results for the same query differently depending on information need, mood, time of day, and a host of other factors. We present a Bayesian approach to using logs to compute posterior distributions for probabilistic models of user interactions. Since they are distributions rather than point estimates, they naturally capture variability in the population. We show how to cluster posterior distributions to discover patterns of user interactions in logs, and discuss how to use the clusters to evaluate search engines according to a user model. Because the approach is Bayesian, our methods can be applied to very large logs (such as those possessed by Web search engines) as well as very small (such as those found in almost any other setting).

References

Doug Beeferman and Adam Berger. Agglomerative clustering of a search engine query log. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '00, pages 407--416, 2000. Google ScholarDigital Library
Ben Carterette, Evangelos Kanoulas, and Emine Yilmaz. Simulating simple user behavior for system effectiveness evaluation. In Proceedings of the 20th ACM international conference on Information and knowledge management, CIKM '11, pages 611--620, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
Olivier Chapelle, Donald Metzler, Ya Zhang, and Pierre Grinspan. Expceted reciprocal rank for graded relevance. In Proceedings of CIKM, 2009. Google ScholarDigital Library
William S. Cooper. On selecting a measure of retrieval effectiveness. Part I. Readings in Information Retrieval, pages 191--204, 1997. Google ScholarDigital Library
Nick Craswell, Onno Zoeter, Michael Taylor, and Bill Ramsey. An experimental comparison of click position-bias models. In Proceedings of the international conference on Web search and web data mining, WSDM '08, pages 87--94, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
Natasa Kejzar, Simona Korenjak-Cerne, and Vladimir Batagelj. Clustering of distributions: A case of patent citations. Journal of Classification, 28(2):156--183, 2011. Google ScholarDigital Library
Ron Kohavi, Thomas Crook, and Roger Longbotham. Online experimentation at microsoft, 2009.Google Scholar
Lyes Limam, David Coquil, Harald Kosch, and Lionel Brunie. Extracting user interests from search query logs: A clustering approach. In Proceedings of the 2010 Workshops on Database and Expert Systems Applications, DEXA '10, pages 5--9, 2010. Google ScholarDigital Library
Yury Logachev, Lidia Grauer, and Pavel Serdyukov. Tuning parameters of the expected reciprocal rank. In Alain Mille, Fabien L. Gandon, Jacques Misselis, Michael Rabinovich, and Steffen Staab, editors, WWW, pages 571--572. ACM, 2012. Google ScholarDigital Library
Alistair Moffat and Justin Zobel. Rank-biased precision for measurement of retrieval effectiveness. ACM Trans. Inf. Syst., 27(1):1--27, 2008. Google ScholarDigital Library
Filip Radlinski, Madhu Kurup, and Thorsten Joachims. How does clickthrough data reflect retrieval quality? In Proceedings of the 17th ACM conference on Information and knowledge management, CIKM '08, pages 43--52, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
Stephen Robertson. A new interpretation of average precision. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '08, pages 689--690, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
Eldar Sadikov, Jayant Madhavan, Lu Wang, and Alon Halevy. Clustering query refinements by user intent. In WWW 2010, pages 841--850. Stanford InfoLab, April 2010. Google ScholarDigital Library
Ji-Rong Wen, Jian-Yun Nie, and Hong-Jiang Zhang. Query clustering using user logs. ACM Trans. Inf. Syst., 20:59--81, January 2002. Google ScholarDigital Library
Emine Yilmaz, Milad Shokouhi, Nick Craswell, and Stephen Robertson. Expected browsing utility for web search evaluation. In Proceedings of CIKM, pages 1561--1564, 2010. Google ScholarDigital Library
Yuye Zhang, Laurence A. Park, and Alistair Moffat. Click-based evidence for decaying weight distributions in search effectiveness metrics. Inf. Retr., 13:46--69, Feb 2010. Google ScholarDigital Library
Yuye Zhang, Laurence A. F. Park, and Alistair Moffat. Parameter sensitivity in rank-biased precision. In Proceedings of ADCS, 2008.Google Scholar

Index Terms

Incorporating variability in user behavior into systems based evaluation
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results

Recommendations

Summary of the NTCIR-10 INTENT-2 task: subtopic mining and search result diversification
SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

The NTCIR INTENT task comprises two subtasks: {\em Subtopic Mining}, where systems are required to return a ranked list of {\em subtopic strings} for each given query; and {\em Document Ranking}, where systems are required to return a diversified web ...
Read More
The impact of intent selection on diversified search evaluation
SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

To construct a diversified search test collection, a set of possible subtopics (or intents) needs to be determined for each topic, in one way or another, and perintent relevance assessments need to be obtained. In the TREC Web Track Diversity Task, ...
Read More
An Empirical Evaluation on Semantic Search Performance of Keyword-Based and Semantic Search Engines: Google, Yahoo, Msn and Hakia
ICIMP '09: Proceedings of the 2009 Fourth International Conference on Internet Monitoring and Protection

This paper investigates the semantic search performance of search engines. Initially, three keyword-based search engines (Google, Yahoo and Msn) and a semantic search engine (Hakia) were selected. Then, ten queries, from various topics, and four phrases,...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management
October 2012
2840 pages
ISBN:9781450311564
DOI:10.1145/2396761
General Chair:
Xuewen Chen
Wayne State University, USA
,
Program Chairs:
Guy Lebanon
Georgia Institute of Technology
,
Haixun Wang
Microsoft Research Asia
,
Mohammed J. Zaki
Rensselaer Polytechnic Institute
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 October 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
evaluation
test collections
user logs
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 32
  Total Citations
  View Citations
- 318
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Incorporating variability in user behavior into systems based evaluation

CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Summary of the NTCIR-10 INTENT-2 task: subtopic mining and search result diversification

The impact of intent selection on diversified search evaluation

An Empirical Evaluation on Semantic Search Performance of Keyword-Based and Semantic Search Engines: Google, Yahoo, Msn and Hakia

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Incorporating variability in user behavior into systems based evaluation

CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Summary of the NTCIR-10 INTENT-2 task: subtopic mining and search result diversification

The impact of intent selection on diversified search evaluation

An Empirical Evaluation on Semantic Search Performance of Keyword-Based and Semantic Search Engines: Google, Yahoo, Msn and Hakia

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media