ABSTRACT
To find similar web pages to a query page on the Web, this paper introduces a novel link-based similarity measure, called PageSim. Contrast to SimRank, a recursive refinement of cocitation, PageSim can measure similarity between any two web pages, whereas SimRank cannot in some cases. We give some intuitions to the PageSim model, and outline the model with mathematical definitions. Finally, we give an example to illustrate its effectiveness.
- http://www.yahoo.com.Google Scholar
- A. Arasu, J. Cho, H. Garcia-Molina, A. Paepcke, and S. Raghavan. Searching the web. ACM Transactions on Internet Technology, 1(1):243, 2001. Google ScholarDigital Library
- J. Dean and M. R. Henzinger. Finding related pages in the World Wide Web. Computer Networks (Amsterdam, Netherlands: 1999), 31(1116):14671479, 1999. Google ScholarDigital Library
- G. Jeh and J. Widom. Simrank: a measure of structural-context similarity. In KDD '02: Proceedings of the eighth ACM SIGKDD, pages 538543, New York, NY, USA, 2002. ACM Press. Google ScholarDigital Library
- L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998.Google Scholar
Index Terms
- PageSim: a novel link-based measure of web page aimilarity
Recommendations
ASCOS++: An Asymmetric Similarity Measure for Weighted Networks to Address the Problem of SimRank
In this article, we explore the relationships among digital objects in terms of their similarity based on vertex similarity measures. We argue that SimRank—a famous similarity measure—and its families, such as P-Rank and SimRank++, fail to capture ...
Accelerating pairwise SimRank estimation over static and dynamic graphs
Measuring similarities among different vertices is a fundamental problem in graph analysis. Among different similarity measurements, SimRank is one of the most promising and popular. In reality, instead of computing the whole similarity matrix, people ...
Assessing single-pair similarity over graphs by aggregating first-meeting probabilities
Link-based similarity plays an important role in measuring similarities between nodes in a graph. As a widely used link-based similarity, SimRank scores similarity between two nodes as the first-meeting probability of two random surfers. However, due to ...
Comments