skip to main content
10.1145/1277741.1277823acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Hits on the web: how does it compare?

Published: 23 July 2007 Publication History

Abstract

This paper describes a large-scale evaluation of the effectiveness of HITS in comparison with other link-based ranking algorithms, when used in combination with a state-of-the-art text retrieval algorithm exploiting anchor text. We quantified their effectiveness using three common performance measures: the mean reciprocal rank, the mean average precision, and the normalized discounted cumulative gain measurements. The evaluation is based on two large data sets: a breadth-first search crawl of 463 million web pages containing 17.6 billion hyperlinks and referencing 2.9 billion distinct URLs; and a set of 28,043 queries sampled from a query log, each query having on average 2,383 results, about 17 of which were labeled by judges. We found that HITS outperforms PageRank, but is about as effective as web-page in-degree. The same holds true when any of the link-based features are combined with the text retrieval algorithm. Finally, we studied the relationship between query specificity and the effectiveness of selected features, and found that link-based features perform better for general queries, whereas BM25F performs better for specific queries.

References

[1]
B. Amento, L. Terveen, and W. Hill. Does authority mean quality? Predicting expert quality ratings of web documents. In Proc. of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval pages 296--303, 2000.
[2]
M. Bianchini, M. Gori, and F. Scarselli. Inside PageRank. ACM Transactions on Internet Technology 5(1):92--128, 2005.
[3]
A. Borodin, G.O. Roberts, and J.S. Rosenthal. Finding authorities and hubs from link structures on the World Wide Web. In Proc. of the 10th International World Wide Web Conference pages 415--429, 2001.
[4]
A. Borodin, G.O. Roberts, J.S. Rosenthal, and P. Tsaparas. Link analysis ranking: algorithms, theory, and experiments. ACM Transactions on Interet Technology 5(1):231--297, 2005.
[5]
S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30(1-7):107--117, 1998.
[6]
C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In Proc. of the 22nd International Conference on Machine Learning pages 89--96, New York, NY, USA, 2005. ACM Press.
[7]
D. Cohn and H. Chang. Learning to probabilistically identify authoritative documents. In Proc. of the 17th International Conference on Machine Learning pages 167--174, 2000.
[8]
N. Craswell, S. Robertson, H. Zaragoza, and M. Taylor. Relevance weighting for query independent evidence. In Proc. of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval pages 416--423, 2005.
[9]
E. Garfield. Citation analysis as a tool in journal evaluation. Science 178(4060):471--479, 1972.
[10]
Z. Gyöngyi and H. Garcia-Molina. Web spam taxonomy. In 1st International Workshop on Adversarial Information Retrieval on the Web 2005.
[11]
Z. Gyöngyi, H. Garcia-Molina, and J. Pedersen. Combating web spam with TrustRank. In Proc.of the 30th International Conference on Very Large Databases pages 576--587, 2004.
[12]
B.J. Jansen, A. Spink, J. Bateman, and T. Saracevic. Real life information retrieval: a study of user queries on the web. ACM SIGIR Forum 32(1):5--17, 1998.
[13]
K. Järvelin and J. Kekäläinen. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems 20(4):422--446, 2002.
[14]
S.D. Kamvar, T.H. Haveliwala, C.D. Manning, and G.H. Golub. Extrapolation methods for accelerating PageRank computations. In Proc. of the 12th International World Wide Web Conference pages 261--270, 2003.
[15]
M.M. Kessler. Bibliographic coupling between scientific papers. American Documentation 14(1):10--25, 1963.
[16]
J.M. Kleinberg. Authoritative sources in a hyperlinked environment. In Proc. of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms pages 668--677, 1998.
[17]
J.M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5):604--632, 1999.
[18]
A.N. Langville and C.D. Meyer. Deeper inside PageRank. Internet Mathematics 1(3):2005, 335--380.
[19]
R. Lempel and S. Moran. The stochastic approach for link-structure analysis (SALSA)and the TKC effect. Computer Networks and ISDN Systems 33(1-6):387--401, 2000.
[20]
A.Y. Ng, A.X. Zheng, and M.I. Jordan. Stable algorithms for link analysis. In Proc. of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval pages 258--266, 2001.
[21]
L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998.
[22]
J.A. Tomlin. A new paradigm for ranking pages on the World Wide Web. In Proc. of the 12th International World Wide Web Conference pages 350--355, 2003.
[23]
T. Upstill, N. Craswell, and D. Hawking. Predicting fame and fortune: Pagerank or indegree? In Proc. of the Australasian Document Computing Symposium pages 31--40, 2003.
[24]
H. Zaragoza, N. Craswell, M. Taylor, S. Saria, and S. Robertson. Microsoft Cambridge at TREC-13: Web and HARD tracks. In Proc. of the 13th Text Retrieval Conference 2004.

Cited By

View all
  • (2024)Fairness Rising from the Ranks: HITS and PageRank on Homophilic NetworksProceedings of the ACM Web Conference 202410.1145/3589334.3645609(2594-2602)Online publication date: 13-May-2024
  • (2021)Efficient Scalable Temporal Web Graph Store2021 IEEE International Conference on Big Data (Big Data)10.1109/BigData52589.2021.9671984(263-273)Online publication date: 15-Dec-2021
  • (2019)A Comprehensive Collaborative Filtering Approach using Autoencoder in Recommender SystemProceedings of the 2019 5th International Conference on Computing and Artificial Intelligence10.1145/3330482.3330518(185-189)Online publication date: 19-Apr-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
July 2007
946 pages
ISBN:9781595935977
DOI:10.1145/1277741
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 July 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. BM25F
  2. HITS
  3. MRR
  4. NDCG
  5. PageRank
  6. ranking

Qualifiers

  • Article

Conference

SIGIR07
Sponsor:
SIGIR07: The 30th Annual International SIGIR Conference
July 23 - 27, 2007
Amsterdam, The Netherlands

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)3
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Fairness Rising from the Ranks: HITS and PageRank on Homophilic NetworksProceedings of the ACM Web Conference 202410.1145/3589334.3645609(2594-2602)Online publication date: 13-May-2024
  • (2021)Efficient Scalable Temporal Web Graph Store2021 IEEE International Conference on Big Data (Big Data)10.1109/BigData52589.2021.9671984(263-273)Online publication date: 15-Dec-2021
  • (2019)A Comprehensive Collaborative Filtering Approach using Autoencoder in Recommender SystemProceedings of the 2019 5th International Conference on Computing and Artificial Intelligence10.1145/3330482.3330518(185-189)Online publication date: 19-Apr-2019
  • (2019)Uncovering Hidden Links Between Images Through Their Textual ContextEnterprise Information Systems10.1007/978-3-030-26169-6_18(370-395)Online publication date: 28-Jul-2019
  • (2017)Analyzing the impact of deep web on real-time business search2017 4th International Conference on Advanced Computing and Communication Systems (ICACCS)10.1109/ICACCS.2017.8014607(1-5)Online publication date: Jan-2017
  • (2016)Estimating Domain-Specific User Expertise for Answer Retrieval in Community Question-Answering PlatformsProceedings of the 21st Australasian Document Computing Symposium10.1145/3015022.3015032(33-40)Online publication date: 5-Dec-2016
  • (2016)A Machine Learning Based Web Spam Filtering Approach2016 IEEE 30th International Conference on Advanced Information Networking and Applications (AINA)10.1109/AINA.2016.177(973-980)Online publication date: Mar-2016
  • (2016)An Approach to Design an IoT Service for Business—Domain Specific Web SearchProceedings of the International Conference on Data Engineering and Communication Technology10.1007/978-981-10-1675-2_61(621-628)Online publication date: 24-Aug-2016
  • (2015)A Weighted Correlation Index for Rankings with TiesProceedings of the 24th International Conference on World Wide Web10.1145/2736277.2741088(1166-1176)Online publication date: 18-May-2015
  • (2014)Accurately detecting trolls in slashdot zoo via declutteringProceedings of the 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining10.5555/3191835.3191872(188-195)Online publication date: 17-Aug-2014
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media