Article

Beyond PageRank: machine learning for static ranking

Authors:

Matthew Richardson,

Eric BrillAuthors Info & Claims

WWW '06: Proceedings of the 15th international conference on World Wide Web

Pages 707 - 715

https://doi.org/10.1145/1135777.1135881

Published: 23 May 2006 Publication History

Abstract

Since the publication of Brin and Page's paper on PageRank, many in the Web community have depended on PageRank for the static (query-independent) ordering of Web pages. We show that we can significantly outperform PageRank using features that are independent of the link structure of the Web. We gain a further boost in accuracy by using data on the frequency at which users visit Web pages. We use RankNet, a ranking machine learning algorithm, to combine these and other static features based on anchor text and domain characteristics. The resulting model achieves a static ranking pairwise accuracy of 67.3% (vs. 56.7% for PageRank or 50% for random).

References

[1]

B. Amento, L. Terveen, and W. Hill. Does "authority" mean quality? Predicting expert quality ratings of Web documents. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2000.

Digital Library

[2]

B. Bartell, G. Cottrell, and R. Belew. Automatic combination of multiple ranked retrieval systems. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1994.

Digital Library

[3]

P. Boldi, M. Santini, and S. Vigna. PageRank as a function of the damping factor. In Proceedings of the International World Wide Web Conference, May 2005.

Digital Library

[4]

J. Boyan, D. Freitag, and T. Joachims. A machine learning architecture for optimizing web search engines. In AAAI Workshop on Internet Based Information Systems, August 1996.

[5]

S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the Seventh International Wide Web Conference, Brisbane, Australia, 1998. Elsevier.

Digital Library

[6]

A. Broder, R. Lempel, F. Maghoul, and J. Pederson. Efficient PageRank approximation via graph aggregation. In Proceedings of the International World Wide Web Conference, May 2004.

Digital Library

[7]

C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, G. Hullender. Learning to rank using gradient descent. In Proceedings of the 22nd International Conference on Machine Learning, Bonn, Germany, 2005.

Digital Library

[8]

D. Carmel, D. Cohen, R. Fagin, E. Farchi, M. Herscovici, Y. S. Maarek, and A. Soffer. Static index pruning for information retrieval systems. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 43--50, New Orleans, Louisiana, USA, September 2001.

Digital Library

[9]

J. Cho and S. Roy. Impact of search engines on page popularity. In Proceedings of the International World Wide Web Conference, May 2004.

Digital Library

[10]

J. Cho, S. Roy, R. Adams. Page Quality: In search of an unbiased web ranking. In Proceedings of the ACM SIGMOD 2005 Conference. Baltimore, Maryland. June 2005.

Digital Library

[11]

N. Craswell, S. Robertson, H. Zaragoza, and M. Taylor. Relevance weighting for query independent evidence. In Proceedings of the 28th Annual Conference on Research and Development in Information Retrieval (SIGIR), August, 2005.

Digital Library

[12]

N. Dalvi, P. Domingos, Mausam, S. Sanghai, D. Verma. Adversarial Classification. In Proceedings of the Tenth International Conference on Knowledge Discovery and Data Mining (pp. 99--108), Seattle, WA, 2004.

Digital Library

[13]

O. Dekel, C. Manning, and Y. Singer. Log-linear models for label-ranking. In Advances in Neural Information Processing Systems 16. Cambridge, MA: MIT Press, 2003.

[14]

S. Fox, K S. Fox, K. Karnawat, M. Mydland, S. T. Dumais and T. White (2005). Evaluating implicit measures to improve the search experiences. In the ACM Transactions on Information Systems, 23(2), pp. 147--168. April 2005.

Digital Library

[15]

T. Haveliwala. Efficient computation of PageRank. Stanford University Technical Report, 1999.

[16]

T. Haveliwala. Topic-sensitive PageRank. In Proceedings of the International World Wide Web Conference, May 2002.

Digital Library

[17]

D. Hawking and N. Craswell. Very large scale retrieval and Web search. In D. Harman and E. Voorhees (eds), The TREC Book. MIT Press.

[18]

R. Herbrich, T. Graepel, and K. Obermayer. Support vector learning for ordinal regression. In Proceedings of the Ninth International Conference on Artificial Neural Networks, pp. 97--102. 1999.

[19]

M. Ivory and M. Hearst. Statistical profiles of highly-rated Web sites. In Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems, 2002.

Digital Library

[20]

T. Joachims. Optimizing search engines using clickthrough data. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD), 2002.

Digital Library

[21]

T. Joachims, L. Granka, B. Pang, H. Hembrooke, and G. Gay. Accurately Interpreting Clickthrough Data as Implicit Feedback. In Proceedings of the Conference on Research and Development in Information Retrieval (SIGIR), 2005.

Digital Library

[22]

J. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM 46:5, pp. 604--32. 1999.

Digital Library

[23]

A. Langville and C. Meyer. Deeper inside PageRank. Internet Mathematics 1(3):335--380, 2004.

[24]

F. Matthieu and M. Bouklit. The effect of the back button in a random walk: application for PageRank. In Alternate track papers and posters of the Thirteenth International World Wide Web Conference, 2004.

Digital Library

[25]

F. McSherry. A uniform approach to accelerated PageRank computation. In Proceedings of the International World Wide Web Conference, May 2005.

Digital Library

[26]

Y. Minamide. Static approximation of dynamically generated Web pages. In Proceedings of the International World Wide Web Conference, May 2005.

Digital Library

[27]

L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the web. Technical report, Stanford University, Stanford, CA, 1998.

[28]

S. Pandey and C. Olston. User-centric Web crawling. In Proceedings of the International World Wide Web Conference, May 2005.

Digital Library

[29]

M. Richardson and P. Domingos. The intelligent surfer: probabilistic combination of link and content information in PageRank. In Advances in Neural Information Processing Systems 14, pp. 1441--1448. Cambridge, MA: MIT Press, 2002.

[30]

C. Sherman. Teoma vs. Google, Round 2. Available from World Wide Web (http://dc.internet.com/news/article.php/ 1002061), 2002.

[31]

T. Upstill, N. Craswell, and D. Hawking. Predicting fame and fortune: PageRank or indegree?. In the Eighth Australasian Document Computing Symposium. 2003.

[32]

T. Upstill, N. Craswell, and D. Hawking. Query-independent evidence in home page finding. In ACM Transactions on Information Systems. 2003.

Digital Library

Cited By

Aydın AArslan ADinçer B(2024)A set of novel HTML document quality features for Web information retrieval: Including applications to learning to rank for information retrievalExpert Systems with Applications10.1016/j.eswa.2024.123177246(123177)Online publication date: Jul-2024
https://doi.org/10.1016/j.eswa.2024.123177
Wang YVenkatesh PLim B(2022)Interpretable Directed Diversity: Leveraging Model Explanations for Iterative Crowd IdeationProceedings of the 2022 CHI Conference on Human Factors in Computing Systems10.1145/3491102.3517551(1-28)Online publication date: 29-Apr-2022
https://dl.acm.org/doi/10.1145/3491102.3517551
Zhang WLim B(2022)Towards Relatable Explainable AI with the Perceptual ProcessProceedings of the 2022 CHI Conference on Human Factors in Computing Systems10.1145/3491102.3501826(1-24)Online publication date: 29-Apr-2022
https://dl.acm.org/doi/10.1145/3491102.3501826
Show More Cited By

Index Terms

Beyond PageRank: machine learning for static ranking
1. Computing methodologies
  1. Machine learning
2. Information systems
  1. Information retrieval

Recommendations

Topic-sensitive PageRank
WWW '02: Proceedings of the 11th international conference on World Wide Web

In the original PageRank algorithm for improving the ranking of search-query results, a single PageRank vector is computed, using the link structure of the Web, to capture the relative "importance" of Web pages, independent of any particular search ...
Local methods for estimating pagerank values
CIKM '04: Proceedings of the thirteenth ACM international conference on Information and knowledge management

The Google search engine uses a method called PageRank, together with term-based and other ranking techniques, to order search results returned to the user. PageRank uses link analysis to assign a global importance score to each web page. The PageRank ...
Associated pagerank: improved pagerank measured by frequent term sets
VECIMS'09: Proceedings of the 2009 IEEE international conference on Virtual Environments, Human-Computer Interfaces and Measurement Systems

Web search engines encounter many new challenges while the amount of information on the web increases rapidly. Web documents have been a main resource for various purposes, and people rely on search engines to retrieve the desired documents. This paper ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '06: Proceedings of the 15th international conference on World Wide Web

May 2006

1102 pages

ISBN:1595933239

DOI:10.1145/1135777

General Chairs:
Leslie Carr
University of Southampton
,
David De Roure
University of Southampton
,
Arun Iyengar
IBM Research
,
Program Chairs:
Carole Goble
University of Manchester, UK
,
Mike Dahlin
University of Texas at Austin

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 May 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

WWW06

Sponsor:

WWW06: The 15th International World Wide Web Conference 2006

May 23 - 26, 2006

Edinburgh, Scotland

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

115
Total Citations
View Citations
2,280
Total Downloads

Downloads (Last 12 months)52
Downloads (Last 6 weeks)2

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Aydın AArslan ADinçer B(2024)A set of novel HTML document quality features for Web information retrieval: Including applications to learning to rank for information retrievalExpert Systems with Applications10.1016/j.eswa.2024.123177246(123177)Online publication date: Jul-2024
https://doi.org/10.1016/j.eswa.2024.123177
Wang YVenkatesh PLim B(2022)Interpretable Directed Diversity: Leveraging Model Explanations for Iterative Crowd IdeationProceedings of the 2022 CHI Conference on Human Factors in Computing Systems10.1145/3491102.3517551(1-28)Online publication date: 29-Apr-2022
https://dl.acm.org/doi/10.1145/3491102.3517551
Zhang WLim B(2022)Towards Relatable Explainable AI with the Perceptual ProcessProceedings of the 2022 CHI Conference on Human Factors in Computing Systems10.1145/3491102.3501826(1-24)Online publication date: 29-Apr-2022
https://dl.acm.org/doi/10.1145/3491102.3501826
Parsa KHassall MNaderpour M(2022)Enhancing Alarm Prioritization in the Alarm Management LifecycleIEEE Access10.1109/ACCESS.2021.313786510(99-111)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2021.3137865
Mitra AKundu AChattopadhyay MBanerjee A(2022)An Approach to Detect Fake Profiles in Social Networks Using Cellular Automata-Based PageRank Validation Model Involving Energy TransferSN Computer Science10.1007/s42979-022-01315-63:6Online publication date: 6-Aug-2022
https://doi.org/10.1007/s42979-022-01315-6
Aggarwal CAggarwal C(2022)Information Retrieval and Search EnginesMachine Learning for Text10.1007/978-3-030-96623-2_9(257-302)Online publication date: 10-Feb-2022
https://doi.org/10.1007/978-3-030-96623-2_9
Jaton F(2021)Assessing biases, relaxing moralism: On ground-truthing practices in machine learning design and applicationBig Data & Society10.1177/205395172110135698:1Online publication date: 5-May-2021
https://doi.org/10.1177/20539517211013569
Sheikholeslami SMeister MWang TPayberah AVlassov VDowling J(2021)AutoAblationProceedings of the 1st Workshop on Machine Learning and Systems10.1145/3437984.3458834(55-61)Online publication date: 26-Apr-2021
https://dl.acm.org/doi/10.1145/3437984.3458834
Parsa KHassall MNaderpour M(2021)Process Alarm Modeling Using Graph Theory: Alarm Design Review and RationalizationIEEE Systems Journal10.1109/JSYST.2020.301904115:2(2257-2268)Online publication date: Jun-2021
https://doi.org/10.1109/JSYST.2020.3019041
Ali FKhusro S(2021)Content and link-structure perspective of ranking webpagesComputer Science Review10.1016/j.cosrev.2021.10039740:COnline publication date: 1-May-2021
https://dl.acm.org/doi/10.1016/j.cosrev.2021.100397
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten