ABSTRACT
Sampling from large graphs is an area which is of great interest, particularly with the recent emergence of huge structures such as Online Social Networks. These often contain hundreds of millions of vertices and billions of edges. The large size of these networks makes it computationally expensive to obtain structural properties of the underlying graph by exhaustive search. If we can estimate these properties by taking small but representative samples from the network, then size is no longer a problem.
In this paper we develop an analysis of random walks, a commonly used method of sampling from networks. We present a method of biassing the random walk to acquire a complete sample of high degree vertices of social networks, or similar graphs. The preferential attachment model is a common method to generate graphs with a power law degree sequence. For this model, we prove that this sampling method is successful with high probability. We also make experimental studies of the method on various real world networks.
For t-vertex graphs G(t) generated by a preferential attachment process, we analyze a biassed random walk which makes transitions along undirected edges {x,y} proportional to [d(x)d(y)]b, where d(x) is the degree of vertex x and b > 0 is a constant parameter. Let S(a) be the set of all vertices of degree at least ta in G(t). We show that for some b approx 2/3, if the biassed random walk starts at an arbitrary vertex of S(a), then with high probability the set S(a) can be discovered completely in ~O(t1-(4/3)a+d) steps, where d is a very small positive constant. The notation ~O ignores poly-log t factors.
The preferential attachment process generates graphs with power law 3, so the above example is a special case of this result. For graphs with degree sequence power law c>2 generated by a generalized preferential attachment process, a random walk with transitions along undirected edges {x,y} proportional to (d(x)d(y))(c-2)/2, discovers the set S(a) completely in ~O(t1-a(c-2)+d) steps with high probability. The cover time of the graph is ~O(t).
Our results say that if we search preferential attachment graphs with a bias b=(c-2)/2 proportional to the power law c then, (i) we can find all high degree vertices quickly, and (ii) the time to discover all vertices is not much higher than in the case of a simple random walk. We conduct experimental tests on generated networks and real-world networks, which confirm these two properties.
- D. Achlioptas, A. Clauset, D. Kempe, and C. Moore. On the bias of traceroute sampling: or, power-law degree distributions in regular graphs.J. ACM, 56(4), 2009. Google ScholarDigital Library
- D. Aldous and J. A. Fill. Reversible Markov chains and random walks on graphs. http://stat-www.berkeley.edu/pub/users/aldous/RWG/book.html, 1995.Google Scholar
- R. Baeza-Yates, C. Castillo, M. Marin, and A. Rodriguez. Crawling a country: Better strategies than breadth-first for web page ordering. In Proceedings of the 14th international conference on World Wide Web / Industrial and Practical Experience Track, pages 864--872. ACM Press, 2005. Google ScholarDigital Library
- A. Barabasi and R. Albert. Emergence of scaling in random networks. Science, number 5439 in Volume 286, pages 509--512, 1999.Google Scholar
- B. Bollobas and O. Riordan. Handbook of Graphs and Networks: Mathematical results on scale-free graphs, pages 1--32. S. Bornholdt, H. Schuster (eds), Wiley-VCH, 2002.Google Scholar
- B. Bollobas and O. Riordan. The diameter of a scale-free random graph. Combinatorica, 24:5--34, 2004. Google ScholarDigital Library
- B. Bollobas, O. Riordan, J. Spencer, and G. Tusnady. The degree sequence of a scale-free random graph process.Random Structures and Algorithms, 18:279--290, 2001. Google ScholarDigital Library
- M. Brautbar and M. Kearns. Local algorithms for finding interesting individuals in large networks. In Proceedings of ICS 2010, pages 188--199, 2010.Google Scholar
- A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener. Graph structure in the web: experiments and models. In Proceedings of the Ninth International World-Wide Web Conference WWW9, Amsterdam 2000. http://www.www9.org/w9cdrom/160/160.html. Google ScholarDigital Library
- C. Cooper. Classifying special interest groups in web graphs. In Proc. RANDOM 2002: Randomization and Approximation Techniques in Computer Science, pages 263--275, 2002. Google ScholarDigital Library
- C. Cooper. The age specific degree distribution of web-graphs. Combinatorics Probability and Computing, 15:637--661, 2006. Google ScholarDigital Library
- C. Cooper and A. Frieze. A general model web graphs. In Random Structures and Algorithms, vol. 22(3), pages 311--335, 2003. Google ScholarDigital Library
- C. Cooper and A. Frieze. The cover time of the preferential attachment graphs.Journal of Combinatorial Theory, B(97):269--290, 2007. Google ScholarDigital Library
- S. Dill, R. Kumar, K. S. Mccurley, S. Rajagopalan, D. Sivakumar, and A. Tomkins. Self-similarity in the web. ACM Trans. Internet Technol., 2:205--223, August 2002. Google ScholarDigital Library
- M. E. E. Drinea and M. Mitzenmacher. Variations on random graph models for the web. Tech. report, Harvard University, Dept. of Computer Science, 2001.Google Scholar
- M. Faloutsos, P. Faloutsos, and C. Faloutsos. On power-law relationships of the Internet topology. In Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication, SIGCOMM '99, pages 251--262, New York, NY, USA, 1999. ACM. Google ScholarDigital Library
- A. D. Flaxman and J. Vera. Bias reduction in traceroute sampling - towards a more accurate map of the Internet. In Proceedings of WAW 2007, pages 1--15, 2007. Google ScholarDigital Library
- M. Gjoka, M. Kurant, C. T. Butts, and A. Markopoulou. A walk in Facebook: Uniform sampling of users in online social networks. CoRR, abs/0906.0060, 2009.Google Scholar
- S. Ikeda, I. Kubo, N. Okumoto, and M. Yamashita. Impact of Local Topological Information on Random Walks on Finite Graphs. In Proceedings of ICALP 2003, pages 1054--1067. Google ScholarDigital Library
- J. M. Kleinberg, R. Kumar, P. Raghavan, S. Rajagopalan, and A. S. Tomkins. The Web as a graph: measurements, models, and methods. In Proceedings of the 5th Annual International Conference on Computing and Combinatorics, COCOON'99, pages 1--17, 1999. Google ScholarDigital Library
- J. Leskovec and C. Faloutsos. Sampling from large graphs. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 631--636, 2006. Google ScholarDigital Library
- L. Lovasz. Random walks on graphs: A survey.Bolyai Society Mathematical Studies, 2:353--397, 1996.Google Scholar
- R. Albert and A.-L. Barabasi. Statistical mechanics of complex networks. InReviews Of Modern Physics, vol. 74, pages 47--97, 2002.Google ScholarCross Ref
- A. H. Rasti, M. Torkjazi, R. Rejaie, N. Duffield, W. Willinger, and D. Stutzbach. Evaluating sampling techniques for large dynamic graphs. Technical Report CIS-TR-08-01, Department of Computer and Information Science, University of Oregon, September 2008.Google Scholar
- S. Redner. How popular is your paper? An empirical study of the citation distribution. In European Physical Journal Bvol. 4(2), pages 131--134, 1998.Google Scholar
- D. Stutzbach, R. Rejaie, N.G. Duffield, S. Sen and W. Willinger. On unbiased sampling for unstructured peer-to-peer networks. In Proceedings of the 6th ACM SIGCOMM Conference on Internet Measurement - IMC 2006, pages 27--40, 2006. Google ScholarDigital Library
Index Terms
- A fast algorithm to find all high degree vertices in power law graphs
Recommendations
Fast Random Walks on Finite Graphs and Graph Topological Information
ICNC '11: Proceedings of the 2011 Second International Conference on Networking and ComputingA random walk on a graph is a process in which a particle on a vertex repeatedly moves to its adjacent vertex according to transition probability, which is given in advance. The behavior of random walks depend on its transition probability, and the ``...
A fast algorithm to find all high degree vertices in graphs with a power law degree sequence
WAW'12: Proceedings of the 9th international conference on Algorithms and Models for the Web GraphWe develop a fast method for finding all high degree vertices of a connected graph with a power law degree sequence. The method uses a biassed random walk, where the bias is a function of the power law c of the degree sequence.
Let G(t) be a t-vertex ...
Walking on a graph with a magnifying glass: stratified sampling via weighted random walks
SIGMETRICS '11: Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systemsOur objective is to sample the node set of a large unknown graph via crawling, to accurately estimate a given metric of interest. We design a random walk on an appropriately defined weighted graph that achieves high efficiency by preferentially crawling ...
Comments