tutorial

A fast algorithm to find all high degree vertices in power law graphs

Authors:
Colin Cooper

King's College, London, United Kingdom

King's College, London, United Kingdom
View Profile

,
Tomasz Radzik

King's College, London, United Kingdom

King's College, London, United Kingdom
View Profile

,
Yiannis Siantos

King's College, London, United Kingdom

King's College, London, United Kingdom
View Profile

WWW '12 Companion: Proceedings of the 21st International Conference on World Wide WebApril 2012Pages 1007–1016https://doi.org/10.1145/2187980.2188235

Published:16 April 2012Publication History

WWW '12 Companion: Proceedings of the 21st International Conference on World Wide Web

Pages 1007–1016

ABSTRACT

Sampling from large graphs is an area which is of great interest, particularly with the recent emergence of huge structures such as Online Social Networks. These often contain hundreds of millions of vertices and billions of edges. The large size of these networks makes it computationally expensive to obtain structural properties of the underlying graph by exhaustive search. If we can estimate these properties by taking small but representative samples from the network, then size is no longer a problem.

In this paper we develop an analysis of random walks, a commonly used method of sampling from networks. We present a method of biassing the random walk to acquire a complete sample of high degree vertices of social networks, or similar graphs. The preferential attachment model is a common method to generate graphs with a power law degree sequence. For this model, we prove that this sampling method is successful with high probability. We also make experimental studies of the method on various real world networks.

For t-vertex graphs G(t) generated by a preferential attachment process, we analyze a biassed random walk which makes transitions along undirected edges {x,y} proportional to [d(x)d(y)]^b, where d(x) is the degree of vertex x and b > 0 is a constant parameter. Let S(a) be the set of all vertices of degree at least t^a in G(t). We show that for some b approx 2/3, if the biassed random walk starts at an arbitrary vertex of S(a), then with high probability the set S(a) can be discovered completely in ~O(t^1-(4/3)a+d) steps, where d is a very small positive constant. The notation ~O ignores poly-log t factors.

The preferential attachment process generates graphs with power law 3, so the above example is a special case of this result. For graphs with degree sequence power law c>2 generated by a generalized preferential attachment process, a random walk with transitions along undirected edges {x,y} proportional to (d(x)d(y))^(c-2)/2, discovers the set S(a) completely in ~O(t^1-a(c-2)+d) steps with high probability. The cover time of the graph is ~O(t).

Our results say that if we search preferential attachment graphs with a bias b=(c-2)/2 proportional to the power law c then, (i) we can find all high degree vertices quickly, and (ii) the time to discover all vertices is not much higher than in the case of a simple random walk. We conduct experimental tests on generated networks and real-world networks, which confirm these two properties.

References

D. Achlioptas, A. Clauset, D. Kempe, and C. Moore. On the bias of traceroute sampling: or, power-law degree distributions in regular graphs.J. ACM, 56(4), 2009. Google ScholarDigital Library
D. Aldous and J. A. Fill. Reversible Markov chains and random walks on graphs. http://stat-www.berkeley.edu/pub/users/aldous/RWG/book.html, 1995.Google Scholar
R. Baeza-Yates, C. Castillo, M. Marin, and A. Rodriguez. Crawling a country: Better strategies than breadth-first for web page ordering. In Proceedings of the 14th international conference on World Wide Web / Industrial and Practical Experience Track, pages 864--872. ACM Press, 2005. Google ScholarDigital Library
A. Barabasi and R. Albert. Emergence of scaling in random networks. Science, number 5439 in Volume 286, pages 509--512, 1999.Google Scholar
B. Bollobas and O. Riordan. Handbook of Graphs and Networks: Mathematical results on scale-free graphs, pages 1--32. S. Bornholdt, H. Schuster (eds), Wiley-VCH, 2002.Google Scholar
B. Bollobas and O. Riordan. The diameter of a scale-free random graph. Combinatorica, 24:5--34, 2004. Google ScholarDigital Library
B. Bollobas, O. Riordan, J. Spencer, and G. Tusnady. The degree sequence of a scale-free random graph process.Random Structures and Algorithms, 18:279--290, 2001. Google ScholarDigital Library
M. Brautbar and M. Kearns. Local algorithms for finding interesting individuals in large networks. In Proceedings of ICS 2010, pages 188--199, 2010.Google Scholar
A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener. Graph structure in the web: experiments and models. In Proceedings of the Ninth International World-Wide Web Conference WWW9, Amsterdam 2000. http://www.www9.org/w9cdrom/160/160.html. Google ScholarDigital Library
C. Cooper. Classifying special interest groups in web graphs. In Proc. RANDOM 2002: Randomization and Approximation Techniques in Computer Science, pages 263--275, 2002. Google ScholarDigital Library
C. Cooper. The age specific degree distribution of web-graphs. Combinatorics Probability and Computing, 15:637--661, 2006. Google ScholarDigital Library
C. Cooper and A. Frieze. A general model web graphs. In Random Structures and Algorithms, vol. 22(3), pages 311--335, 2003. Google ScholarDigital Library
C. Cooper and A. Frieze. The cover time of the preferential attachment graphs.Journal of Combinatorial Theory, B(97):269--290, 2007. Google ScholarDigital Library
S. Dill, R. Kumar, K. S. Mccurley, S. Rajagopalan, D. Sivakumar, and A. Tomkins. Self-similarity in the web. ACM Trans. Internet Technol., 2:205--223, August 2002. Google ScholarDigital Library
M. E. E. Drinea and M. Mitzenmacher. Variations on random graph models for the web. Tech. report, Harvard University, Dept. of Computer Science, 2001.Google Scholar
M. Faloutsos, P. Faloutsos, and C. Faloutsos. On power-law relationships of the Internet topology. In Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication, SIGCOMM '99, pages 251--262, New York, NY, USA, 1999. ACM. Google ScholarDigital Library
A. D. Flaxman and J. Vera. Bias reduction in traceroute sampling - towards a more accurate map of the Internet. In Proceedings of WAW 2007, pages 1--15, 2007. Google ScholarDigital Library
M. Gjoka, M. Kurant, C. T. Butts, and A. Markopoulou. A walk in Facebook: Uniform sampling of users in online social networks. CoRR, abs/0906.0060, 2009.Google Scholar
S. Ikeda, I. Kubo, N. Okumoto, and M. Yamashita. Impact of Local Topological Information on Random Walks on Finite Graphs. In Proceedings of ICALP 2003, pages 1054--1067. Google ScholarDigital Library
J. M. Kleinberg, R. Kumar, P. Raghavan, S. Rajagopalan, and A. S. Tomkins. The Web as a graph: measurements, models, and methods. In Proceedings of the 5th Annual International Conference on Computing and Combinatorics, COCOON'99, pages 1--17, 1999. Google ScholarDigital Library
J. Leskovec and C. Faloutsos. Sampling from large graphs. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 631--636, 2006. Google ScholarDigital Library
L. Lovasz. Random walks on graphs: A survey.Bolyai Society Mathematical Studies, 2:353--397, 1996.Google Scholar
R. Albert and A.-L. Barabasi. Statistical mechanics of complex networks. InReviews Of Modern Physics, vol. 74, pages 47--97, 2002.Google ScholarCross Ref
A. H. Rasti, M. Torkjazi, R. Rejaie, N. Duffield, W. Willinger, and D. Stutzbach. Evaluating sampling techniques for large dynamic graphs. Technical Report CIS-TR-08-01, Department of Computer and Information Science, University of Oregon, September 2008.Google Scholar
S. Redner. How popular is your paper? An empirical study of the citation distribution. In European Physical Journal Bvol. 4(2), pages 131--134, 1998.Google Scholar
D. Stutzbach, R. Rejaie, N.G. Duffield, S. Sen and W. Willinger. On unbiased sampling for unstructured peer-to-peer networks. In Proceedings of the 6th ACM SIGCOMM Conference on Internet Measurement - IMC 2006, pages 27--40, 2006. Google ScholarDigital Library

Index Terms

Recommendations

Fast Random Walks on Finite Graphs and Graph Topological Information
ICNC '11: Proceedings of the 2011 Second International Conference on Networking and Computing

A random walk on a graph is a process in which a particle on a vertex repeatedly moves to its adjacent vertex according to transition probability, which is given in advance. The behavior of random walks depend on its transition probability, and the ``...
Read More
A fast algorithm to find all high degree vertices in graphs with a power law degree sequence
WAW'12: Proceedings of the 9th international conference on Algorithms and Models for the Web Graph

We develop a fast method for finding all high degree vertices of a connected graph with a power law degree sequence. The method uses a biassed random walk, where the bias is a function of the power law c of the degree sequence.

Let G(t) be a t-vertex ...
Read More
Walking on a graph with a magnifying glass: stratified sampling via weighted random walks
SIGMETRICS '11: Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems

Our objective is to sample the node set of a large unknown graph via crawling, to accurately estimate a given metric of interest. We design a random walk on an appropriately defined weighted graph that achieves high efficiency by preferentially crawling ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '12 Companion: Proceedings of the 21st International Conference on World Wide Web
April 2012
1250 pages
ISBN:9781450312301
DOI:10.1145/2187980
General Chairs:
Alain Mille
Université de Lyon, France
,
Fabien Gandon
INRIA, France
,
Jacques Misselis
HP, France
,
Program Chairs:
Michael Rabinovich
Case Western Reserve University, USA
,
Steffen Staab
University of Koblenz-Landau, Germany
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 16 April 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
graph sampling
random walks
weighted random walks
Qualifiers
- tutorial
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 10
  Total Citations
  View Citations
- 203
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A fast algorithm to find all high degree vertices in power law graphs

WWW '12 Companion: Proceedings of the 21st International Conference on World Wide Web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Fast Random Walks on Finite Graphs and Graph Topological Information

A fast algorithm to find all high degree vertices in graphs with a power law degree sequence

Walking on a graph with a magnifying glass: stratified sampling via weighted random walks