research-article

Modeling user variance in time-biased gain

Authors:
Mark D. Smucker

University of Waterloo, Canada

University of Waterloo, Canada
View Profile

,
Charles L. A. Clarke

University of Waterloo, Canada

University of Waterloo, Canada
View Profile

HCIR '12: Proceedings of the Symposium on Human-Computer Interaction and Information RetrievalOctober 2012Article No.: 3Pages 1–10https://doi.org/10.1145/2391224.2391227

Published:04 October 2012Publication History

HCIR '12: Proceedings of the Symposium on Human-Computer Interaction and Information Retrieval

Pages 1–10

ABSTRACT

Cranfield-style information retrieval evaluation considers variance in user information needs by evaluating retrieval systems over a set of search topics. For each search topic, traditional metrics model all users searching ranked lists in exactly the same manner and thus have zero variance in their per-topic estimate of effectiveness. Metrics that fail to model user variance overestimate the effect size of differences between retrieval systems. The modeling of user variance is critical to understanding the impact of effectiveness differences on the actual user experience. If the variance of a difference is high, the effect on user experience will be low. Time-biased gain is an evaluation metric that models user interaction with ranked lists that are displayed using document surrogates. In this paper, we extend the stochastic simulation of time-biased gain to model the variation between users. We validate this new version of time-biased gain by showing that it produces distributions of gain that agree well with actual distributions produced by real users. With a per-topic variance in its effectiveness measure, time-biased gain allows for the measurement of the effect size of differences, which allows researchers to understand the extent to which predicted performance improvements matter to real users.

References

Aula, A., Majaranta, P., and Räihä, K.-J. Eye-tracking reveals the personal styles for search result evaluation. In Human-Computer Interaction -- INTERACT 2005, vol. 3585 of LNCS, Springer (2005), 1058--1061. Google ScholarDigital Library
Azzopardi, L. The economics in interactive information retrieval. In SIGIR, (2011), 15--24. Google ScholarDigital Library
Azzopardi, L., Järvelin, K., Kamps, J., and Smucker, M. D. Report on the SIGIR 2010 workshop on the simulation of interaction. SIGIR Forum, (January 2011), 35--47. Google ScholarDigital Library
Baeza-Yates, R., Hurtado, C., Mendoza, M., and Dupret, G. Modeling user search behavior. In Proceedings of the Third Latin American Web Conference, IEEE (2005), 242--251. Google ScholarDigital Library
Carterette, B., Kanoulas, E., and Yilmaz, E. Simulating simple user behavior for system effectiveness evaluation. In CIKM, (2011), 611--620. Google ScholarDigital Library
Chi, E. H., Pirolli, P., Chen, K., and Pitkow, J. Using information scent to model user information needs and actions and the web. In SIGCHI, (2001), 490--497. Google ScholarDigital Library
Clarke, C. L., Craswell, N., Soboroff, I., and Ashkan, A. A comparative analysis of cascade measures for novelty and diversity. In WSDM, (2011), 75--84. Google ScholarDigital Library
Cormack, G. V., and Lynam, T. R. Statistical precision of information retrieval evaluation. In SIGIR, (2006), 533--540. Google ScholarDigital Library
Dumais, S. T., Buscher, G., and Cutrell, E. Individual differences in gaze patterns for web search. In IIiX, (2010), 185--194. Google ScholarDigital Library
Dunlop, M. D. Time, relevance and interaction modelling for information retrieval. In SIGIR, (1997), 206--213. Google ScholarDigital Library
Grissom, R. J., and Kim, J. J. Effect Sizes for Research, 2nd ed. Routledge, Taylor and Francis Group, 2012.Google Scholar
Hersh, W., Turpin, A., Price, S., Chan, B., Kramer, D., Sacherek, L., and Olson, D. Do batch and user evaluations give the same results? In SIGIR, (2000), 17--24. Google ScholarDigital Library
Järvelin, K., and Kekäläinen, J. Cumulated gain-based evaluation of IR techniques. TOIS, (2002), 20(4): 422--446. Google ScholarDigital Library
Keskustalo, H., Järvelin, K., Sharma, T., and Nielsen, M. L. Test collection-based IR evaluation needs extension toward sessions: A case of extremely short queries. In AIRS, (2009), 63--74. Google ScholarDigital Library
Lin, J., and Smucker, M. D. How do users find things with PubMed? Towards automatic utility evaluation with user simulations. In SIGIR, (2008), 19--26. Google ScholarDigital Library
Lipsey, M. W., and Wilson, D. B. Practical Meta-Analysis. Sage Publications, Inc., 2001.Google Scholar
Moffat, A., and Zobel, J. Rank-biased precision for measurement of retrieval effectiveness. TOIS, (2008), 27(1): 1--27. Google ScholarDigital Library
O'Brien, M., Keane, M. T., and Smyth, B. Predictive modeling of first-click behavior in web-search. In WWW, (2006), 1031--1032. Google ScholarDigital Library
Pavlu, V., Rajput, S., Golbus, P. B., and Aslam, J. A. IR system evaluation using nugget-based test collections. In WSDM, (2012), 393--402. Google ScholarDigital Library
Robertson, S. A new interpretation of average precision. In SIGIR, (2008), 689--690. Google ScholarDigital Library
Smith, C. L., and Kantor, P. B. User adaptation: good results from poor systems. In SIGIR, (2008), 147--154. Google ScholarDigital Library
Smucker, M. D. An analysis of user strategies for examining and processing ranked lists of documents. In HCIR, (2011).Google Scholar
Smucker, M. D., and Clarke, C. L. A. Stochastic simulation of time-biased gain. To appear in CIKM, (2012), 5 pages. Google ScholarDigital Library
Smucker, M. D., and Clarke, C. L. A. Time-based calibration of effectiveness measures. In SIGIR, (2012), 95--104. Google ScholarDigital Library
Smucker, M. D., and Jethani, C. Human performance and retrieval precision revisited. In SIGIR, (2010), 595--602. Google ScholarDigital Library
Voorhees, E. M. Overview of the TREC 2005 Robust Retrieval Track. In TREC, (2005).Google Scholar
Voorhees, E. M. I come not to bury Cranfield, but to praise it. In HCIR, (2009), 13--16.Google Scholar
Voorhees, E. M., and Harman, D. K., Eds. TREC. MIT Press, 2005.Google Scholar
Weiss, E. N., Cohen, M. A., and Hershey, J. C. An iterative estimation and validation procedure for specification of semi-Markov models with application to hospital patient flow. Operations Research, (1982), pp. 1082--1104.Google Scholar
White, R. W., Ruthven, I., Jose, J. M., and van Rijsbergen, C. J. Evaluating implicit feedback models using searcher simulations. TOIS, (2005), 23(3): 325--361. Google ScholarDigital Library
Yilmaz, E., Shokouhi, M., Craswell, N., and Robertson, S. Expected browsing utility for web search evaluation (2010). In CIKM, (2010), 1561--1564. Google ScholarDigital Library
Zhang, Y., Park, L. A., and Moffat, A. Click-based evidence for decaying weight distributions in search effectiveness metrics. Information Retrieval (2010), 13: 46--69. Google ScholarDigital Library

Index Terms

Modeling user variance in time-biased gain
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results

Recommendations

Time-based calibration of effectiveness measures
SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval

Many current effectiveness measures incorporate simplifying assumptions about user behavior. These assumptions prevent the measures from reflecting aspects of the search process that directly impact the quality of retrieval results as experienced by the ...
Read More
Stochastic simulation of time-biased gain
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

Time-biased gain provides a unifying framework for information retrieval evaluation, generalizing many traditional effectiveness measures while accommodating aspects of user behavior not captured by these measures. By using time as a basis for ...
Read More
SIGIR 2013 workshop on modeling user behavior for information retrieval evaluation
SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

The SIGIR 2013 Workshop on Modeling User Behavior for Information Retrieval Evaluation (MUBE 2013) brings together people to discuss existing and new approaches, ways to collaborate, and other ideas and issues involved in improving information retrieval ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

HCIR '12: Proceedings of the Symposium on Human-Computer Interaction and Information Retrieval
October 2012
42 pages
ISBN:9781450317962
DOI:10.1145/2391224

Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 October 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
information retrieval
search evaluation
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 21
  Total Citations
  View Citations
- 201
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Modeling user variance in time-biased gain

HCIR '12: Proceedings of the Symposium on Human-Computer Interaction and Information Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Time-based calibration of effectiveness measures

Stochastic simulation of time-biased gain

SIGIR 2013 workshop on modeling user behavior for information retrieval evaluation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Modeling user variance in time-biased gain

HCIR '12: Proceedings of the Symposium on Human-Computer Interaction and Information Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Time-based calibration of effectiveness measures

Stochastic simulation of time-biased gain

SIGIR 2013 workshop on modeling user behavior for information retrieval evaluation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media