ACM Home Page
Please provide us with feedback. Feedback
Hits on the web: how does it compare?
Full text PdfPdf (243 KB)
Source
Annual ACM Conference on Research and Development in Information Retrieval archive
Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval table of contents
Amsterdam, The Netherlands
SESSION: Link analysis table of contents
Pages: 471 - 478  
Year of Publication: 2007
ISBN:978-1-59593-597-7
Authors
Marc A. Najork  Microsoft Research, Mountain View, CA
Hugo Zaragoza  Yahoo! Research, Barcelona, Spain
Michael J. Taylor  Microsoft Research, Cambridge, United Kingdom
Sponsors
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 40,   Downloads (12 Months): 317,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1277741.1277823
What is a DOI?

ABSTRACT

This paper describes a large-scale evaluation of the effectiveness of HITS in comparison with other link-based ranking algorithms, when used in combination with a state-of-the-art text retrieval algorithm exploiting anchor text. We quantified their effectiveness using three common performance measures: the mean reciprocal rank, the mean average precision, and the normalized discounted cumulative gain measurements. The evaluation is based on two large data sets: a breadth-first search crawl of 463 million web pages containing 17.6 billion hyperlinks and referencing 2.9 billion distinct URLs; and a set of 28,043 queries sampled from a query log, each query having on average 2,383 results, about 17 of which were labeled by judges. We found that HITS outperforms PageRank, but is about as effective as web-page in-degree. The same holds true when any of the link-based features are combined with the text retrieval algorithm. Finally, we studied the relationship between query specificity and the effectiveness of selected features, and found that link-based features perform better for general queries, whereas BM25F performs better for specific queries.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
4
 
5
6
 
7
8
 
9
E. Garfield. Citation analysis as a tool in journal evaluation. Science 178(4060):471--479, 1972.
 
10
Z. Gyöngyi and H. Garcia-Molina. Web spam taxonomy. In 1st International Workshop on Adversarial Information Retrieval on the Web 2005.
 
11
12
13
14
 
15
M.M. Kessler. Bibliographic coupling between scientific papers. American Documentation 14(1):10--25, 1963.
 
16
17
 
18
A.N. Langville and C.D. Meyer. Deeper inside PageRank. Internet Mathematics 1(3):2005, 335--380.
 
19
20
 
21
L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998.
22
 
23
T. Upstill, N. Craswell, and D. Hawking. Predicting fame and fortune: Pagerank or indegree? In Proc. of the Australasian Document Computing Symposium pages 31--40, 2003.
 
24
H. Zaragoza, N. Craswell, M. Taylor, S. Saria, and S. Robertson. Microsoft Cambridge at TREC-13: Web and HARD tracks. In Proc. of the 13th Text Retrieval Conference 2004.


Collaborative Colleagues:
Marc A. Najork: colleagues
Hugo Zaragoza: colleagues
Michael J. Taylor: colleagues