research-article

Effects of position and number of relevant documents retrieved on users' evaluations of system performance

Authors:
Diane Kelly

University of North Carolina at Chapel Hill, Chapel Hill, NC

University of North Carolina at Chapel Hill, Chapel Hill, NC
View Profile

,
Xin Fu

University of North Carolina at Chapel Hill, Chapel Hill, NC

University of North Carolina at Chapel Hill, Chapel Hill, NC
View Profile

,
Chirag Shah

University of North Carolina at Chapel Hill, Chapel Hill, NC

University of North Carolina at Chapel Hill, Chapel Hill, NC
View Profile

Authors Info & Claims

ACM Transactions on Information Systems Volume 28 Issue 2Article No.: 9pp 1–29https://doi.org/10.1145/1740592.1740597

Published:10 June 2010Publication History

ACM Transactions on Information Systems

Abstract

Information retrieval research has demonstrated that system performance does not always correlate positively with user performance, and that users often assign positive evaluation scores to search systems even when they are unable to complete tasks successfully. This research investigated the relationship between objective measures of system performance and users' perceptions of that performance. In this study, subjects evaluated the performance of four search systems whose search results were manipulated systematically to produce different orderings and numbers of relevant documents. Three laboratory studies were conducted with a total of eighty-one subjects. The first two studies investigated the effect of the order of five relevant and five nonrelevant documents in a search results list containing ten results on subjects' evaluations. The third study investigated the effect of varying the number of relevant documents in a search results list containing ten results on subjects' evaluations. Results demonstrate linear relationships between subjects' evaluations and the position of relevant documents in a search results list and the total number of relevant documents retrieved. Of the two, number of relevant documents retrieved was a stronger predictor of subjects' evaluation ratings and resulted in subjects using a greater range of evaluation scores.

References

Allan, J. 2006. HARD Track overview in TREC 2005 high accuracy retrieval from documents. In Proceedings of the Text Retrieval Conference (TREC-2005). E. M. Voorhees and L. P. Buckland, Eds. Government Printing Office, Washington, D.C.Google Scholar
Allan, J., Carterette, B., and Lewis, J. 2005. When will information retrieval be ‘good enough’&quest; In Proceedings of the 28th Annual ACM International Conference on Research and Development in Information Retrieval (SIGIR). 433--440. Google ScholarDigital Library
Al-Maskari, A., Sanderson, M., and Clough, P. 2007. The relationship between IR effectiveness measures and user satisfaction. In Proceedings of the 30th Annual ACM International Conference on Research and Development in Information Retrieval (SIGIR). 773--774. Google ScholarDigital Library
Bar-Ilan, J., Keenoy, K., Yaari, E., and Levene, M. 2007. User rankings of search engine results. J. Amer. Soc. Inform. Sci. Tech. 58, 9, 1254--1266. Google ScholarDigital Library
Blair, D. C. and Maron, M. E. 1985. An evaluation of retrieval effectiveness for a full-text document-retrieval system. Comm. ACM, 28, 3, 289--299. Google ScholarDigital Library
Borlund, P. 2003a. The IIR evaluation model: A framework for evaluation of interactive information retrieval systems. Inform. Res. 8, 3, no. 152.Google Scholar
Borlund, P. 2003b. The concept of relevance in IR. J. Ameri. Soc. Inform. Sci. Tech. 54, 10, 913--925. Google ScholarDigital Library
Chen, H. and Dumais, S. 2000. Bringing order to the Web: Automatically categorizing search results. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 145--152. Google ScholarDigital Library
Cohen, J. 1988. Statistical Power Analysis for the Behavioral Sciences 2nd Ed. Lawrence Earlbaum Associates, Hillsdale, NJ.Google Scholar
Cutrell, E. and Guan, Z. 2007. What are you looking for&quest; An eye-tracking study of information usage in Web search. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (SIGCHI). 407--416. Google ScholarDigital Library
Dumais, S. T. and Belkin, N. J. 2005. The TREC interactive tracks: Putting the user into search. In TREC: Experiment and Evaluation in Information Retrieval. E. M. Voorhees and D. K. Harman Eds. MIT Press, 123--153.Google Scholar
Fox, S., Karnawat, K., Mydland, M., Dumais, S., and White, T. 2005. Evaluating implicit measures to improve Web search. ACM Trans. Inform. Syst. 23, 2, 147--168. Google ScholarDigital Library
Hersh, W., Turpin, A., Price, S., Chan, B., Kraemer, D., Sacherek, L., and Olson, D. 2000. Do batch and user evaluations give the same results&quest; In Proceedings of the 23rd Annual ACM International Conference on Research and Development in Information Retrieval (SIGIR). 17--24. Google ScholarDigital Library
Hornbæk, K. and Law, E. L.-C. 2007. Meta-analysis of correlations among usability measures. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (SIGCHI). 617--626. Google ScholarDigital Library
Huffman, S. B. and Hochster, M. 2007. How well does result relevance predict session satisfaction&quest; In Proceedings of 30th Annual ACM International Conference on Research and Development in Information Retrieval (SIGIR). 567--573. Google ScholarDigital Library
Joachims, T., Granka, L., Pan, B., Hembrooke, H., Radlinski, F., and Gay, G. 2007. Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search. ACM Trans. Inform. Syst. 25, 2. Google ScholarDigital Library
Kaki, M. and Aula, A. 2008. Controlling the complexity in comparing search user interfaces via user studies. Inform. Proc. Manag. 44, 1, 82--91. Google ScholarDigital Library
Kelly, D., Shah, C., Sugimoto, C. R., Bailey, E. W., Clemens, R. A., Irvine, A. K., Johnson, N. A., Ke, W., Oh, S., Poljakova, A., Rodriguez, M. A., Van Noord, M. G., and Zhang, Y. 2008. Effects of performance feedback on users' evaluations of an interactive IR system. In Proceedings of the 2nd Symposium on Information Interaction in Context (IIiX). 75--82. Google ScholarDigital Library
Lee, H.-J., Belkin, N. J., and Krovetz, B. 2006. Rutgers information retrieval performance evaluation project. J. Korean Soc. Inform. Manag., 23, 2, 98--111.Google Scholar
Nielsen, J. and Levy, J. 1994. Measuring usability: Preference vs. performance. Comm. ACM, 37, 4, 66--75. Google ScholarDigital Library
Spink, A. 2002. A user-centered approach to evaluating human interaction with Web search engines: An exploratory study. Inform. Proc. Manag. 38, 401--426. Google ScholarDigital Library
Spink, A. and Jansen, B. J. 2004. Web Search: Public Searching of the Web. Kluwer Academic Publishers. Google ScholarDigital Library
SU, L. T. 2003. A comprehensive and systematic model of user evaluation of Web search engines: II. An evaluation by undergraduates. J. Amer. Soc. Inform. Sci. Tech. 54, 13, 1193--1223. Google ScholarDigital Library
Thomas, P. and Hawking, D. 2006. Evaluation by comparing result sets in context. In Proceedings of the Conference on Information and Knowledge Management (CIKM). 94--101. Google ScholarDigital Library
Toms, E. G., Freund, L., and LI, C. 2004. WiIRE: The Web interactive information retrieval experimentation system prototype. Inform. Proc. Manag. 40, 4, 655--675. Google ScholarDigital Library
Turpin, A. and Hersh, W. 2001. Why batch and user evaluations do not give the same results. In Proceedings of the 24th Annual ACM International Conference on Research and Development in Information Retrieval (SIGIR). 225--231. Google ScholarDigital Library
Turpin, A. and Scholer, F. 2006. User performance versus precision measures for simple search tasks. In Proceedings of the 29th Annual ACM International Conference on Research and Development in Information Retrieval (SIGIR). 11--18. Google ScholarDigital Library
Voorhees, E. M. and Harman, D. K. 2005. TREC: Experiment and Evaluation in Information Retrieval, MIT Press, Cambridge, MA. Google ScholarDigital Library

Index Terms

Effects of position and number of relevant documents retrieved on users' evaluations of system performance
1. Human-centered computing
  1. Human computer interaction (HCI)

Recommendations

Using hit curves to compare search algorithm performance

Databases continue to grow but the metrics available to evaluate information retrieval systems have not changed. Large collections such as MEDLINE and the World Wide Web contain many relevant documents for common queries. Ranking is therefore ...
Read More
Identification of top relevant temporal expressions in documents
TempWeb '12: Proceedings of the 2nd Temporal Web Analytics Workshop

Temporal information is very common in textual documents, and thus, identifying, normalizing, and organizing temporal expressions is an important task in IR. Although there are some tools for temporal tagging, there is a lack in research focusing on the ...
Read More
On identifying representative relevant documents
CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management

Using relevance feedback can significantly improve the effectiveness of ad hoc (query-based) retrieval. However, retrieval performance can significantly vary with respect to the given set of relevant documents. Our goal is to establish a quantitative ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Information Systems Volume 28, Issue 2
May 2010
165 pages
ISSN:1046-8188
EISSN:1558-2868
DOI:10.1145/1740592
Issue’s Table of Contents

Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 June 2010
- Accepted: 1 June 2009
- Revised: 1 January 2009
- Received: 1 June 2007
Published in tois Volume 28, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Search performance
precision
presentation of search results
ranking
satisfaction
user evaluation of performance
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 12
  Total Citations
  View Citations
- 608
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Effects of position and number of relevant documents retrieved on users' evaluations of system performance

ACM Transactions on Information Systems

Abstract

References

Cited By

Index Terms

Recommendations

Using hit curves to compare search algorithm performance

Identification of top relevant temporal expressions in documents

On identifying representative relevant documents

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Effects of position and number of relevant documents retrieved on users' evaluations of system performance

ACM Transactions on Information Systems

Abstract

References

Cited By

Index Terms

Recommendations

Using hit curves to compare search algorithm performance

Identification of top relevant temporal expressions in documents

On identifying representative relevant documents

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media