skip to main content
10.1145/2391224.2391227acmotherconferencesArticle/Chapter ViewAbstractPublication PageshcirConference Proceedingsconference-collections
research-article

Modeling user variance in time-biased gain

Published:04 October 2012Publication History

ABSTRACT

Cranfield-style information retrieval evaluation considers variance in user information needs by evaluating retrieval systems over a set of search topics. For each search topic, traditional metrics model all users searching ranked lists in exactly the same manner and thus have zero variance in their per-topic estimate of effectiveness. Metrics that fail to model user variance overestimate the effect size of differences between retrieval systems. The modeling of user variance is critical to understanding the impact of effectiveness differences on the actual user experience. If the variance of a difference is high, the effect on user experience will be low. Time-biased gain is an evaluation metric that models user interaction with ranked lists that are displayed using document surrogates. In this paper, we extend the stochastic simulation of time-biased gain to model the variation between users. We validate this new version of time-biased gain by showing that it produces distributions of gain that agree well with actual distributions produced by real users. With a per-topic variance in its effectiveness measure, time-biased gain allows for the measurement of the effect size of differences, which allows researchers to understand the extent to which predicted performance improvements matter to real users.

References

  1. Aula, A., Majaranta, P., and Räihä, K.-J. Eye-tracking reveals the personal styles for search result evaluation. In Human-Computer Interaction -- INTERACT 2005, vol. 3585 of LNCS, Springer (2005), 1058--1061. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Azzopardi, L. The economics in interactive information retrieval. In SIGIR, (2011), 15--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Azzopardi, L., Järvelin, K., Kamps, J., and Smucker, M. D. Report on the SIGIR 2010 workshop on the simulation of interaction. SIGIR Forum, (January 2011), 35--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Baeza-Yates, R., Hurtado, C., Mendoza, M., and Dupret, G. Modeling user search behavior. In Proceedings of the Third Latin American Web Conference, IEEE (2005), 242--251. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Carterette, B., Kanoulas, E., and Yilmaz, E. Simulating simple user behavior for system effectiveness evaluation. In CIKM, (2011), 611--620. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Chi, E. H., Pirolli, P., Chen, K., and Pitkow, J. Using information scent to model user information needs and actions and the web. In SIGCHI, (2001), 490--497. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Clarke, C. L., Craswell, N., Soboroff, I., and Ashkan, A. A comparative analysis of cascade measures for novelty and diversity. In WSDM, (2011), 75--84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Cormack, G. V., and Lynam, T. R. Statistical precision of information retrieval evaluation. In SIGIR, (2006), 533--540. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Dumais, S. T., Buscher, G., and Cutrell, E. Individual differences in gaze patterns for web search. In IIiX, (2010), 185--194. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Dunlop, M. D. Time, relevance and interaction modelling for information retrieval. In SIGIR, (1997), 206--213. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Grissom, R. J., and Kim, J. J. Effect Sizes for Research, 2nd ed. Routledge, Taylor and Francis Group, 2012.Google ScholarGoogle Scholar
  12. Hersh, W., Turpin, A., Price, S., Chan, B., Kramer, D., Sacherek, L., and Olson, D. Do batch and user evaluations give the same results? In SIGIR, (2000), 17--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Järvelin, K., and Kekäläinen, J. Cumulated gain-based evaluation of IR techniques. TOIS, (2002), 20(4): 422--446. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Keskustalo, H., Järvelin, K., Sharma, T., and Nielsen, M. L. Test collection-based IR evaluation needs extension toward sessions: A case of extremely short queries. In AIRS, (2009), 63--74. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Lin, J., and Smucker, M. D. How do users find things with PubMed? Towards automatic utility evaluation with user simulations. In SIGIR, (2008), 19--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Lipsey, M. W., and Wilson, D. B. Practical Meta-Analysis. Sage Publications, Inc., 2001.Google ScholarGoogle Scholar
  17. Moffat, A., and Zobel, J. Rank-biased precision for measurement of retrieval effectiveness. TOIS, (2008), 27(1): 1--27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. O'Brien, M., Keane, M. T., and Smyth, B. Predictive modeling of first-click behavior in web-search. In WWW, (2006), 1031--1032. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Pavlu, V., Rajput, S., Golbus, P. B., and Aslam, J. A. IR system evaluation using nugget-based test collections. In WSDM, (2012), 393--402. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Robertson, S. A new interpretation of average precision. In SIGIR, (2008), 689--690. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Smith, C. L., and Kantor, P. B. User adaptation: good results from poor systems. In SIGIR, (2008), 147--154. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Smucker, M. D. An analysis of user strategies for examining and processing ranked lists of documents. In HCIR, (2011).Google ScholarGoogle Scholar
  23. Smucker, M. D., and Clarke, C. L. A. Stochastic simulation of time-biased gain. To appear in CIKM, (2012), 5 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Smucker, M. D., and Clarke, C. L. A. Time-based calibration of effectiveness measures. In SIGIR, (2012), 95--104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Smucker, M. D., and Jethani, C. Human performance and retrieval precision revisited. In SIGIR, (2010), 595--602. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Voorhees, E. M. Overview of the TREC 2005 Robust Retrieval Track. In TREC, (2005).Google ScholarGoogle Scholar
  27. Voorhees, E. M. I come not to bury Cranfield, but to praise it. In HCIR, (2009), 13--16.Google ScholarGoogle Scholar
  28. Voorhees, E. M., and Harman, D. K., Eds. TREC. MIT Press, 2005.Google ScholarGoogle Scholar
  29. Weiss, E. N., Cohen, M. A., and Hershey, J. C. An iterative estimation and validation procedure for specification of semi-Markov models with application to hospital patient flow. Operations Research, (1982), pp. 1082--1104.Google ScholarGoogle Scholar
  30. White, R. W., Ruthven, I., Jose, J. M., and van Rijsbergen, C. J. Evaluating implicit feedback models using searcher simulations. TOIS, (2005), 23(3): 325--361. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Yilmaz, E., Shokouhi, M., Craswell, N., and Robertson, S. Expected browsing utility for web search evaluation (2010). In CIKM, (2010), 1561--1564. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Zhang, Y., Park, L. A., and Moffat, A. Click-based evidence for decaying weight distributions in search effectiveness metrics. Information Retrieval (2010), 13: 46--69. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Modeling user variance in time-biased gain

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      HCIR '12: Proceedings of the Symposium on Human-Computer Interaction and Information Retrieval
      October 2012
      42 pages
      ISBN:9781450317962
      DOI:10.1145/2391224

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 4 October 2012

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader