|
ABSTRACT
Evaluation of IR systems has always been difficult because of the need for manually assessed relevance judgments. The advent of large editor-driven taxonomies on the web opens the door to a new evaluation approach. We use the ODP (Open Directory Project) taxonomy to find sets of pseudo-relevant documents via one of two assumptions: 1) taxonomy entries are relevant to a given query if their editor-entered titles exactly match the query, or 2) all entries in a leaf-level taxonomy category are relevant to a given query if the category title exactly matches the query. We compare and contrast these two methodologies by evaluating six web search engines on a sample from an America Online log of ten million web queries, using MRR measures for the first method and precision-based measures for the second. We show that this technique is stable with respect to the query set selected and correlated with a reasonably large manual evaluation.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Steven M. Beitzel , Eric C. Jensen , Abdur Chowdhury , David Grossman , Ophir Frieder, Using manually-built web directories for automatic evaluation of known-item retrieval, Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, July 28-August 01, 2003, Toronto, Canada
[doi> 10.1145/860435.860507]
|
| |
2
|
Boyan, J., Freitag, D., and Joachims, T. A machine learning architecture for optimizing web search engines. In AAAI'96 (August, 1996) Workshop on Internet Based Information Systems. http://www.cs.cornell.edu/People/tj/publications/boyan_etal_96a.pdf
|
 |
3
|
|
 |
4
|
Peter Bruza , Robert McArthur , Simon Dennis, Interactive Internet search: keyword, directory and query reformulation mechanisms compared, Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, p.280-287, July 24-28, 2000, Athens, Greece
[doi> 10.1145/345508.345598]
|
 |
5
|
|
| |
6
|
Buckley, C. Proposal to TREC Web Track mailing list (November, 2001). http://groups.yahoo.com/group/webir/message/760
|
 |
7
|
|
| |
8
|
Craswell, N., Bailey, P., and Hawking, D. Is it fair to evaluate Web systems using TREC ad hoc methods? SIGIR'99 (Berkeley, CA, 1999) Workshop on Web Evaluation. http://pigfish.vic.cmis.csiro.au/ nickc/pubs/sigir99ws.ps.gz
|
| |
9
|
Wei Ding and Gary Marchionini. Comparative study of web search service performance. In Proceedings of the ASIS 1996 Annual Conference (October 1996).
|
| |
10
|
|
| |
11
|
Hawking, D., Craswell, N., and Thistlewaite P. Overview of TREC-7 Very Large Collection Track. In Proceedings of TReC7 (Gaithersburg, MD, 1998), NIST Special Publication 500-242, 91--104.
|
| |
12
|
Hawking, D., Voorhees, E., Craswell, N., and Bailey, P. Overview of the TREC-8 Web Track. In Proceedings of TReC8 (Gaithersburg, MD, 1999), NIST Special Publication 500-246, 131--149.
|
| |
13
|
|
| |
14
|
Hawking, D., and Craswell, N. Overview of the TREC-2001 Web Track. In Proceedings of TReC10 (Gaithersburg, MD, 2001), NIST Special Publication 500-250, 61--67.
|
| |
15
|
|
| |
16
|
Hawking, D., Craswell, N., and Griffiths, K. Which search engine is best at finding airline site home pages? CMIS Technical Report 01/45 (March, 2001). http://pigfish.vic.cmis.csiro.au/ nickc/pubs/TR01-45.pdf
|
| |
17
|
Hawking, D., Craswell, N., and Griffiths, K. Which Search Engine is Best at Finding Online Services? In Proceedings of WWW10 (Hong Kong, May 2001), Posters. Actual poster available as <http://pigfish.vic.cmis.csiro.au/ nickc/pubs/www10actualposter.pdf>
|
| |
18
|
Hawking, D., and Craswell, N. Overview of the TREC-2002 Web Track. To appear in Proceedings of TReC11 (Gaithersburg, MD, 2002).
|
 |
19
|
Taher H. Haveliwala , Aristides Gionis , Dan Klein , Piotr Indyk, Evaluating strategies for similarity search on the web, Proceedings of the 11th international conference on World Wide Web, May 07-11, 2002, Honolulu, Hawaii, USA
[doi> 10.1145/511446.511502]
|
| |
20
|
|
| |
21
|
Joachims, T. Evaluating Retrieval Performance using Clickthrough Data. SIGIR'02 (Tampere, Finland, August, 2002) Workshop on Mathematical/Formal Methods in Information Retrieval. http://www.cs.cornell.edu/People/tj/publications/joachims_02b.pdf
|
| |
22
|
|
| |
23
|
|
| |
24
|
Menczer, F. Semi-Supervised Evaluation of Search Engines via Semantic Mapping. Submitted to WWW'03 (Budapest, Hungary, 2003), ACM Press. http://dollar.biz.uiowa.edu/ fil/Papers/engines.pdf
|
| |
25
|
|
 |
26
|
|
 |
27
|
|
| |
28
|
|
 |
29
|
|
 |
30
|
|
 |
31
|
|
CITED BY 5
|
|
|
Steven M. Beitzel , Eric C. Jensen , Abdur Chowdhury , David Grossman , Ophir Frieder, Hourly analysis of a very large topically categorized web query log, Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, July 25-29, 2004, Sheffield, United Kingdom
|
|
|
|
|
|
|
|
|
|