skip to main content
10.1145/1390334.1390411acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Separate and inequal: preserving heterogeneity in topical authority flows

Published: 20 July 2008 Publication History

Abstract

Web pages, like people, are often known by others in a variety of contexts. When those contexts are sufficiently distinct, a page's importance may be better represented by multiple domains of authority, rather than by one that indiscriminately mixes reputations. In this work we determine domains of authority by examining the contexts in which a page is cited. However, we find that it is not enough to determine separate domains of authority; our model additionally determines the local flow of authority based upon the relative similarity of the source and target authority domains. In this way, we differentiate both incoming and outgoing hyperlinks by topicality and importance rather than treating them indiscriminately. We find that this approach compares favorably to other topical ranking methods on two real-world datasets and produces an approximately 10% improvement in precision and quality of the top ten results over PageRank.

References

[1]
R. Andersen and K. J. Lang. Communities from seed sets. In Proceedings of the 15th International World Wide Web Conference, pages 223--232, Edinburgh, Scotland, May 2006.
[2]
D. Cai, X. He, J.-R. Wen, and W.-Y. Ma. Block-level link analysis. In Proc. of the 27th Annual Int'l ACM SIGIR Conf. on Research and Development in Information Retrieval, Sheffield, UK, July 2004.
[3]
S. Chakrabarti, B. E. Dom, D. Gibson, J. M. Kleinberg, S. R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Mining the Web's link structure. IEEE Computer, pages 60--67, Aug. 1999.
[4]
J. Cho, H. Garcia-Molina, T. Haveliwala, W. Lam, A. Paepcke, S. Raghavan, and G. Wesley. Stanford WebBase components and applications. ACM Trans. on Internet Technology, 6(2):153--186, 2006.
[5]
G. W. Flake, S. Lawrence, and C. L. Giles. Efficient identification of web communities. In Proc. of the 6th ACM Int'l Conf. on Knowledge Discovery and Data Mining (KDD), pages 150--160, Boston, Aug. 2000.
[6]
T. H. Haveliwala. Topic-sensitive PageRank. In Proc. of the 11th Int'l World Wide Web Conf., pages 517--526. ACM Press, May 2002.
[7]
J. Hirai, S. Raghavan, H. Garcia-Molina, and A. Paepcke. WebBase: A repository of web pages. Computer Networks, 33(1-6):277--293, May 2000. Proc. of the 9th Int' World Wide Web Conf.
[8]
K. Jarvelin and J. Kekalainen. IR evaluation methods for retrieving highly relevant documents. In Proc. of the 23rd Annual Int'l ACM SIGIR Conf. on Research and Development in Information Retrieval, pages 41--48, July 2000.
[9]
R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Trawling the Web for emerging cyber-communities. Computer Networks, 31(11-16):1481--1493, 1999.
[10]
A. K. McCallum. Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering. http://www.cs.cmu.edu/~mccallum/bow, 1996.
[11]
L. Nie, B. D. Davison, and X. Qi. Topical link analysis for web search. In Proc. of the 29th Annual Int'l ACM SIGIR Conf. on Research & Development in Information Retrieval, pages 91--98, Aug. 2006.
[12]
L. Nie, B. D. Davison, and B. Wu. From whence does your authority come? Utilizing community relevance in ranking. In Proceedings of the 22nd National Conference on Artificial Intelligence (AAAI), pages 1421--1426, July 2007.
[13]
L. Nie, B. D. Davison, and B. Wu. Ranking by community relevance. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 873--874, July 2007.
[14]
The dmoz Open Directory Project (ODP), 2008. http://www.dmoz.com/.
[15]
L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the Web. Technical report, Stanford University, 1998. Available from http://dbpubs.stanford.edu/pub/1999-66. Accessed 29 March 2008.
[16]
S. K. Pal and B. L. Narayan. A web surfer model incorporating topic continuity. IEEE Transactions on Knowledge and Data Engineering, 17:726--729, 2005.
[17]
D. Rafiei and A. O. Mendelzon. What do the neighbours think? Computing web page reputations. IEEE Data Engineering Bulletin, 23(3):9--16, Sept. 2000.
[18]
D. Rafiei and A. O. Mendelzon. What is this page known for? Computing web page reputations. Computer Networks, 33(1-6):832--835, 2000. Proceedings of the 9th International World Wide Web Conference.
[19]
M. Richardson and P. Domingos. The Intelligent Surfer: Probabilistic combination of link and content information in PageRank. In Advances in Neural Information Processing Systems 14. MIT Press, 2002.
[20]
G. O. Roberts and J. S. Rosenthal. Downweighting tightly knit communities in world wide web rankings. Advances and Applications in Statistics, 3(3):199--216, Dec. 2003.
[21]
S. E. Robertson. Overview of the OKAPI projects. Journal of Documentation, 53:3--7, 1997.

Cited By

View all

Index Terms

  1. Separate and inequal: preserving heterogeneity in topical authority flows

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
    July 2008
    934 pages
    ISBN:9781605581644
    DOI:10.1145/1390334
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 July 2008

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. PageRank
    2. link analysis
    3. reputation
    4. web search engine

    Qualifiers

    • Research-article

    Conference

    SIGIR '08
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 02 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2017)Incremental maintenance of C-Rank scores in dynamic web environment2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC)10.1109/SMC.2017.8122838(1570-1574)Online publication date: 5-Oct-2017
    • (2014)C-Rank and its variantsJournal of Information Science10.1177/016555151454542940:6(761-778)Online publication date: 1-Dec-2014
    • (2014)C-RankProceedings of the 29th Annual ACM Symposium on Applied Computing10.1145/2554850.2554910(908-912)Online publication date: 24-Mar-2014
    • (2010)Quantifying sentiment and influence in blogspacesProceedings of the First Workshop on Social Media Analytics10.1145/1964858.1964866(53-61)Online publication date: 25-Jul-2010
    • (2010)Mining neighbors' topicality to better control authority flowProceedings of the 32nd European conference on Advances in Information Retrieval10.1007/978-3-642-12275-0_69(653-657)Online publication date: 28-Mar-2010

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media