Article

Evaluation by highly relevant documents

Author:
Ellen M. Voorhees

National Institute of Standards and Technology, Gaithersburg, MD

National Institute of Standards and Technology, Gaithersburg, MD
View Profile

SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrievalSeptember 2001Pages 74–82https://doi.org/10.1145/383952.383963

Published:01 September 2001Publication History

SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval

Pages 74–82

ABSTRACT

Given the size of the web, the search engine industry has argued that engines should be evaluated by their ability to retrieve highly relevant pages rather than all possible relevant pages. To explore the role highly relevant documents play in retrieval system evaluation, assessors for the \mbox{TREC-9} web track used a three-point relevance scale and also selected best pages for each topic. The relative effectiveness of runs evaluated by different relevant document sets differed, confirming the hypothesis that different retrieval techniques work better for retrieving highly relevant documents. Yet evaluating by highly relevant documents can be unstable since there are relatively few highly relevant documents. TREC assessors frequently disagreed in their selection of the best page, and subsequent evaluation by best page across different assessors varied widely. The discounted cumulative gain measure introduced by J\"{a}rvelin and Kek\"{a}l\"{a}inen increases evaluation stability by incorporating all relevance judgments while still giving precedence to highly relevant documents.

References

1.Internet Archive. The Internet Archive: Building an 'Internet Library'. http://www.archive.org.Google Scholar
2.Pia Borlund and Peter Ingwersen. Measures of relative relevance and ranked half-life: Performance indicators for interactive IR. In W. Bruce Croft, Alistair Moffat, C.J. van Rijsbergen, Ross Wilkinson, and Justin Zobel, editors, Proceedings of the 21st Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval, pages 324-331, Melbourne, Australia, August 1998. ACM Press, New York. Google ScholarDigital Library
3.Sergey Brin and Lawrence Page. The anatomy ofa large-scale hypretextual web search engine. In Proceedings of the Seventh International World Wide Web Conference, http://www7.scu.edu.au/ programme/fullpapers/1921/com1921.htm, 1998. Google ScholarDigital Library
4.Chris Buckley and Ellen M. Voorhees. Evaluating evaluation measure stability. In N. Belkin, P. Ingwersen, and M.K. Leong, editors, Proceedings of the 23rd Annual International ACM SIGIR Conference onResearch and Development in Information Rertrieval, pages 33-40, 2000. Google ScholarDigital Library
5.W.S. Cooper. Expected search length: A single measure of retrieval effectiveness based on the weak ordering action of retrieval systems. American Documentation, 19:30-41, January 1968.Google ScholarCross Ref
6.Michael Gordon and Praveen Pathak. Finding information on the world wide web: the retrieval effectiveness of search engines. Information Processing and Management, 35:141-180, 1999. Google ScholarDigital Library
7.David Hawking, Nick Craswell, and Paul Thistlewaite. Overview of TREC-7 very large collection track. In E.M. Voorhees and D.K. Harman, editors, Proceedings of the Seventh Text REtrieval Conference (TREC-7), pages 91-103, August 1999. NIST Special Publication 500-242. Electronic version available at http://trec.nist.gov/pubs.html.Google Scholar
8.David Hawking, Nick Craswell, Paul Thistlewaite, and Donna Harman. Results and challenges in web search evaluation. In Proceedings of the Eighth International World Wide Web Conference, http://www8.org/w8-papers/2c-search-discover/ results/results.html, 1999. Google ScholarDigital Library
9.David Hawking, Ellen Voorhees, Nick Craswell, and Peter Bailey. Overview of the TREC-8 web track. In E.M. Voorhees and D.K. Harman, editors, Proceedings of the Eighth Text REtrieval Conference (TREC-8), pages 131-150, 2000. NIST Special Publication 500-246. Electronic version available at http://trec.nist.gov/pubs.html.Google Scholar
10.Kalervo J. arvelin and Jaana Kek.al. ainen. IR evaluation methods for retrieving highly relevant documents. In Nicholas J. Belkin, Peter Ingwersen, and Mun-Kew Leong, editors, Proceedings of the 23rd Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval, pages 41-48, July 2000. Google ScholarDigital Library
11.Chris Sherman. Special report|the 5th annual search engines meeting. http://websearch.about.com/ internet/websearch/library/blsem.htm. Section "The Fireworks Fly".Google Scholar
12.Alan Stuart. Kendall's tau. In Samuel Kotz and Norman L. Johnson, editors, Encyclopedia of Statistical Sciences, volume 4, pages 367-369. John Wiley & Sons, 1983.Google Scholar
13.Bob Travis and Andrei Broder. The need behind the query: Web search vs classic information retrieval. http://www.infonortics.com/searchengines/sh01/ slides-01/sh01pro.html.Google Scholar
14.Ellen M. Voorhees. Variations in relevance judgments and the measurement ofretrieval effectiveness. Information Processing and Management, 36:697-716, 2000. Google ScholarDigital Library
15.Ellen M. Voorhees and Donna Harman. Overview of the sixth Text REtrieval Conference (TREC-6). Information Processing and Management, 36(1):3-35, January 2000. Google ScholarDigital Library

Index Terms

Evaluation by highly relevant documents
1. Information systems
  1. Information retrieval

Recommendations

IR evaluation methods for retrieving highly relevant documents
SIGIR Test-of-Time Awardees 1978-2001

This paper proposes evaluation methods based on the use of non-dichotomous relevance judgements in IR experiments. It is argued that evaluation methods should credit IR methods for their ability to retrieve highly relevant documents. This is desirable ...
Read More
IR evaluation methods for retrieving highly relevant documents
SIGIR '00: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval

This paper proposes evaluation methods based on the use of non-dichotomous relevance judgements in IR experiments. It is argued that evaluation methods should credit IR methods for their ability to retrieve highly relevant documents. This is desirable ...
Read More
A WEBIR Crawling Framework for Retrieving Highly Relevant Web Documents: Evaluation Based on Rank Aggregation and Result Merging Algorithms
CICN '11: Proceedings of the 2011 International Conference on Computational Intelligence and Communication Networks

Finding relevant information on the web is an ongoing problem. Commercial search engines like Google rely on sophisticated algorithms to index huge collection of web pages to make them accessible to user queries. Users, however, are still frequently ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
September 2001
454 pages
ISBN:1581133316
DOI:10.1145/383952
Chairmen:
Donald H. Kraft
Louisiana State Univ.
,
W. Bruce Croft
University of Massachusetts, (For the Americas)
,
David J. Harper
The Robert Gordon University, (For Europe and Africa)
,
Justin Zobel
RMIT University, (For Asia and Australasia)
Copyright © 2001 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 September 2001
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
SIGIR '01 Paper Acceptance Rate47of201submissions,23%Overall Acceptance Rate792of3,983submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 140
  Total Citations
  View Citations
- 1,519
  Total Downloads
- Downloads (Last 12 months)21
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.