skip to main content
10.1145/1458082.1458092acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

How does clickthrough data reflect retrieval quality?

Published: 26 October 2008 Publication History

Abstract

Automatically judging the quality of retrieval functions based on observable user behavior holds promise for making retrieval evaluation faster, cheaper, and more user centered. However, the relationship between observable user behavior and retrieval quality is not yet fully understood. We present a sequence of studies investigating this relationship for an operational search engine on the arXiv.org e-print archive. We find that none of the eight absolute usage metrics we explore (e.g., number of clicks, frequency of query reformulations, abandonment) reliably reflect retrieval quality for the sample sizes we consider. However, we find that paired experiment designs adapted from sensory analysis produce accurate and reliable statements about the relative quality of two retrieval functions. In particular, we investigate two paired comparison tests that analyze clickthrough data from an interleaved presentation of ranking pairs, and we find that both give accurate and consistent results. We conclude that both paired comparison tests give substantially more accurate and sensitive evaluation results than absolute usage metrics in our domain.

References

[1]
E. Agichtein, E. Brill, S. Dumais, and R. Ragno. Learning user interaction models for prediction web search results preferences. In Proc. of SIGIR 2006.
[2]
K. Ali and C. Chang. On the relationship between click-rate and relevance for search engines. In Proc. of Data-Mining and Information Engineering, 2006.
[3]
J.A. Aslam, V. Pavlu, and E. Yilmaz. A sampling technique for efficiently estimating measures of query retrieval performance using incomplete judgments. In ICML Workshop on Learning with Partial ly Classified Training Data, 2005.
[4]
J. Boyan, D. Freitag, and T. Joachims. A machine learning architecture for optimizing web search engines. In AAAI Workshop on Internet Based Information Systems, 1996.
[5]
C. Buckley and E.M. Voorhees. Retrieval evaluation with incomplete information. In Proc. of SIGIR 2004.
[6]
B. Carterette, J. Allan, and R. Sitaraman. Minimal test collections for retrieval evaluation. In Proc. of SIGIR 2006.
[7]
B. Carterette, P.N. Bennett, D.M. Chickering, and S.T. Dumais. Here or there: Preference judgements for relevance. In Proc. of ECIR 2008.
[8]
B. Carterette and R. Jones. Evaluating search engines by modeling the relationship between relevance and clicks. In Proc. of NIPS 2007.
[9]
G. Dupret, V. Murdock, and B. Piwowarski. Web search engine evaluation using clickthrough data and a user model. In WWW Workshop on Query Log Analysis, 2007.
[10]
S. Fox, K. Karnawat, M. Mydland, S. Dumais, and T. White. Evaluating implicit measures to improve web search. ACM Transactions on Information Science (TOIS), 23(2):147--168, April 2005.
[11]
S.B. Huffman and M. Hochster. How well does result relevance predict session satisfaction? In Proc. of SIGIR 2007.
[12]
T. Joachims. Optimizing search engines using clickthrough data. In Proc. of KDD 2002.
[13]
T. Joachims. Evaluating retrieval performance using clickthrough data. In J. Franke, G. Nakhaeizadeh, and I. Renz, editors, Text Mining. Physica Verlag, 2003.
[14]
T. Joachims, L. Granka, B. Pan, H. Hembrooke, F. Radlinski, and G. Gay. Evaluating the accuracy of implicit feedback from clicks and query reformulations in web search. ACM Transactions on Information Science (TOIS), 25 (2), 2007. Article 7.
[15]
D. Kelly and J. Teevan. Implicit feedback for inferring user preference: A bibliography. ACM SIGIR Forum, 37(2):18--28, 2003.
[16]
J. Kozielecki. Psychological Decision Theory. Kluwer, 1981.
[17]
D. Laming. Sensory Analysis. Academic Press, 1986.
[18]
Y. Liu, Y. Fu, M. Zhang, S. Ma, and L. Ru. Automatic search engine performance evaluation with click-through data analysis. In Proc. of WWW 2007.
[19]
C.D. Manning, P. Raghavan, and H. Schuetze. Introduction to Information Retrieval. Cambridge University Press, 2008.
[20]
J. Reid. A task-oriented non-interactive evaluation methodology for information retrieval systems. Information Retrieval, 2:115--129, 2000.
[21]
I. Soboroff, C. Nicholas, and P. Cahan. Ranking retrieval systems without relevance judgments. In Proc. of SIGIR 2001.
[22]
A. Turpin and F. Scholer. User performance versus precision measures for simple search tasks. In Proc. of SIGIR 2006.
[23]
E.M. Voorhees and D.K. Harman, editors. TREC: Experiment and Evaluation in Information Retrieval. MIT Press, 2005.

Cited By

View all
  • (2024)Meta Learning to Rank for Sparsely Supervised QueriesACM Transactions on Information Systems10.1145/3698876Online publication date: 8-Oct-2024
  • (2024)Improving Educators’ Search Engine Experience: A Quantitative Analysis of Search TermsIEEE Access10.1109/ACCESS.2024.339342312(69076-69086)Online publication date: 2024
  • (2024)Toward joint utilization of absolute and relative bandit feedback for conversational recommendationUser Modeling and User-Adapted Interaction10.1007/s11257-023-09388-5Online publication date: 27-Jan-2024
  • Show More Cited By

Index Terms

  1. How does clickthrough data reflect retrieval quality?

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management
    October 2008
    1562 pages
    ISBN:9781595939913
    DOI:10.1145/1458082
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 October 2008

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. clickthrough data
    2. expert judgments
    3. implicit feedback
    4. retrieval evaluation

    Qualifiers

    • Research-article

    Conference

    CIKM08
    CIKM08: Conference on Information and Knowledge Management
    October 26 - 30, 2008
    California, Napa Valley, USA

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)33
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 17 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Meta Learning to Rank for Sparsely Supervised QueriesACM Transactions on Information Systems10.1145/3698876Online publication date: 8-Oct-2024
    • (2024)Improving Educators’ Search Engine Experience: A Quantitative Analysis of Search TermsIEEE Access10.1109/ACCESS.2024.339342312(69076-69086)Online publication date: 2024
    • (2024)Toward joint utilization of absolute and relative bandit feedback for conversational recommendationUser Modeling and User-Adapted Interaction10.1007/s11257-023-09388-5Online publication date: 27-Jan-2024
    • (2024)Navigating the Evaluation Funnel to Optimize Iteration Speed for Recommender SystemsProceedings of the Future Technologies Conference (FTC) 2024, Volume 110.1007/978-3-031-73110-5_11(138-157)Online publication date: 5-Nov-2024
    • (2024)MassiveClicks: A Massively-Parallel Framework for Efficient Click Models TrainingEuro-Par 2023: Parallel Processing Workshops10.1007/978-3-031-50684-0_18(232-245)Online publication date: 16-Apr-2024
    • (2023)When can we track significant preference shifts in dueling bandits?Proceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667788(38347-38369)Online publication date: 10-Dec-2023
    • (2023)Efficient Exploration and Exploitation for Sequential Music RecommendationACM Transactions on Recommender Systems10.1145/36258272:4(1-23)Online publication date: 27-Sep-2023
    • (2023)Validating Synthetic Usage Data in Living Lab EnvironmentsJournal of Data and Information Quality10.1145/3623640Online publication date: 24-Sep-2023
    • (2023)Interface Design to Mitigate Inflation in Recommender SystemsProceedings of the 17th ACM Conference on Recommender Systems10.1145/3604915.3608846(897-903)Online publication date: 14-Sep-2023
    • (2023)Doubly Robust Estimation for Correcting Position Bias in Click Feedback for Unbiased Learning to RankACM Transactions on Information Systems10.1145/356945341:3(1-33)Online publication date: 7-Feb-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media