|
ABSTRACT
We analyze the traffic-weighted Web host graph obtained from a large sample of real Web users over about seven months. A number of interesting structural properties are revealed by this complex dynamic network, some in line with the well-studied boolean link host graph and others pointing to important differences. We find that while search is directly involved in a surprisingly small fraction of user clicks, it leads to a much larger fraction of all sites visited. The temporal traffic patterns display strong regularities, with a large portion of future requests being statistically predictable by past ones. Given the importance of topological measures such as PageRank in modeling user navigation, as well as their role in ranking sites for Web search, we use the traffic data to validate the PageRank random surfing model. The ranking obtained by the actual frequency with which a site is visited by users differs significantly from that approximated by the uniform surfing/teleportation behavior modeled by PageRank, especially for the most important sites. To interpret this finding, we consider each of the fundamental assumptions underlying PageRank and show how each is violated by actual user behavior
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
L. Adamic and B. Huberman. Power-law distribution of the World Wide Web. Science, 287:2115, 2000.
|
 |
2
|
|
| |
3
|
R. Albert, H. Jeong, and A.-L. Barabási. Diameter of the World Wide Web. Nature, 401(6749):130--131, 1999.
|
| |
4
|
E. Almaas, B. Kovacs, T. Vicsek, Z. N. Oltvai, and A.-L. Barabasi. Global organization of metabolic fluxes in the bacterium escherichia coli. Nature, 427(6977):839--843, 2004.
|
| |
5
|
|
| |
6
|
M. Barthelemy, B. Gondranb, and E. Guichardc. Spatial structure of the internet traffic. Physica A, 319:633--642, March 2003.
|
| |
7
|
|
| |
8
|
P. Boldi, M. Santini, and S. Vigna. Do your worst to make the best: Paradoxical effects in pagerank incremental computations. Internet Mathematics, 2(3):387--404, 2005.
|
 |
9
|
|
| |
10
|
|
| |
11
|
Andrei Broder , Ravi Kumar , Farzin Maghoul , Prabhakar Raghavan , Sridhar Rajagopalan , Raymie Stata , Andrew Tomkins , Janet Wiener, Graph structure in the Web, Computer Networks: The International Journal of Computer and Telecommunications Networking, v.33 n.1-6, p.309-320, June 2000
|
| |
12
|
|
 |
13
|
|
| |
14
|
A. Clauset, C. R. Shalizi, and M. E. J. Newman. Power-law distributions in empirical data. Technical report, arXiv:0706.1062v1 {physics.data-an}, 2007.
|
| |
15
|
|
 |
16
|
Stephen Dill , Ravi Kumar , Kevin S. Mccurley , Sridhar Rajagopalan , D. Sivakumar , Andrew Tomkins, Self-similarity in the web, ACM Transactions on Internet Technology (TOIT), v.2 n.3, p.205-223, August 2002
[doi> 10.1145/572326.572328]
|
| |
17
|
D. Donato, L. Laura, S. Leonardi, and S. Millozzi. Large scale properties of the webgraph. Eur. Phys. J. B, 38:239--243, 2004.
|
 |
18
|
Jeffrey Erman , Anirban Mahanti , Martin Arlitt , Carey Williamson, Identifying and discriminating between web and peer-to-peer traffic in the network core, Proceedings of the 16th international conference on World Wide Web, May 08-12, 2007, Banff, Alberta, Canada
[doi> 10.1145/1242572.1242692]
|
| |
19
|
S. Fortunato and A. Flammini. Random walks on directed networks: the case of pagerank. International Journal of Bifurcation and Chaos, 2007. Forthcoming.
|
| |
20
|
S. Fortunato, A. Flammini, and F. Menczer. Scale-free network growth by ranking. Phys. Rev. Lett., 96(21):218701, 2006.
|
| |
21
|
S. Fortunato, A. Flammini, F. Menczer, and A. Vespignani. Topical interests and the mitigation of search engine bias. Proc. Natl. Acad. Sci. USA, 103(34):12684--12689, 2006.
|
| |
22
|
M. Henzinger, A. Heydon, M. Mitzenmacher, and M. Najork. On near-uniform URL sampling. In Proc. 9th International World Wide Web Conference, 2000.
|
| |
23
|
O. Herfindahl. Copper Costs and Prices: 1870--1957. John Hopkins University Press, Baltimore, MD, 1959.
|
| |
24
|
A. Hirschman. The paternity of an index. American Economic Review, 54(5):761--762, 1964.
|
| |
25
|
|
| |
26
|
M. Kendall. A new measure of rank correlation. Biometrika, 30:81--89, 1938.
|
 |
27
|
|
| |
28
|
J. Luxenburger and G. Weikum. Query-Log Based Authority Analysis for Web Information Search, volume 3306 of Lecture Notes in Computer Science, pages 90--101. Springer Berlin/Heidelberg, 2004.
|
 |
29
|
|
 |
30
|
|
 |
31
|
|
 |
32
|
|
| |
33
|
F. Qiu, Z. Liu, and J. Cho. Analysis of user web traffic with a focus on search activities. In A. Doan, F. Neven, R. McCann, and G. J. Bex, editors, Proc. 8th International Workshop on the Web and Databases (WebDB), pages 103--108, 2005.
|
 |
34
|
|
 |
35
|
|
 |
36
|
|
| |
37
|
Q. Yang and H. H. Zhang. Web-log mining for predictive web caching. IEEE Trans. on Knowledge and Data Engineering, 15(4):1050--1053, 2003.
|
|