skip to main content
10.1145/2187980.2188235acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
tutorial

A fast algorithm to find all high degree vertices in power law graphs

Published:16 April 2012Publication History

ABSTRACT

Sampling from large graphs is an area which is of great interest, particularly with the recent emergence of huge structures such as Online Social Networks. These often contain hundreds of millions of vertices and billions of edges. The large size of these networks makes it computationally expensive to obtain structural properties of the underlying graph by exhaustive search. If we can estimate these properties by taking small but representative samples from the network, then size is no longer a problem.

In this paper we develop an analysis of random walks, a commonly used method of sampling from networks. We present a method of biassing the random walk to acquire a complete sample of high degree vertices of social networks, or similar graphs. The preferential attachment model is a common method to generate graphs with a power law degree sequence. For this model, we prove that this sampling method is successful with high probability. We also make experimental studies of the method on various real world networks.

For t-vertex graphs G(t) generated by a preferential attachment process, we analyze a biassed random walk which makes transitions along undirected edges {x,y} proportional to [d(x)d(y)]b, where d(x) is the degree of vertex x and b > 0 is a constant parameter. Let S(a) be the set of all vertices of degree at least ta in G(t). We show that for some b approx 2/3, if the biassed random walk starts at an arbitrary vertex of S(a), then with high probability the set S(a) can be discovered completely in ~O(t1-(4/3)a+d) steps, where d is a very small positive constant. The notation ~O ignores poly-log t factors.

The preferential attachment process generates graphs with power law 3, so the above example is a special case of this result. For graphs with degree sequence power law c>2 generated by a generalized preferential attachment process, a random walk with transitions along undirected edges {x,y} proportional to (d(x)d(y))(c-2)/2, discovers the set S(a) completely in ~O(t1-a(c-2)+d) steps with high probability. The cover time of the graph is ~O(t).

Our results say that if we search preferential attachment graphs with a bias b=(c-2)/2 proportional to the power law c then, (i) we can find all high degree vertices quickly, and (ii) the time to discover all vertices is not much higher than in the case of a simple random walk. We conduct experimental tests on generated networks and real-world networks, which confirm these two properties.

References

  1. D. Achlioptas, A. Clauset, D. Kempe, and C. Moore. On the bias of traceroute sampling: or, power-law degree distributions in regular graphs.J. ACM, 56(4), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. Aldous and J. A. Fill. Reversible Markov chains and random walks on graphs. http://stat-www.berkeley.edu/pub/users/aldous/RWG/book.html, 1995.Google ScholarGoogle Scholar
  3. R. Baeza-Yates, C. Castillo, M. Marin, and A. Rodriguez. Crawling a country: Better strategies than breadth-first for web page ordering. In Proceedings of the 14th international conference on World Wide Web / Industrial and Practical Experience Track, pages 864--872. ACM Press, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Barabasi and R. Albert. Emergence of scaling in random networks. Science, number 5439 in Volume 286, pages 509--512, 1999.Google ScholarGoogle Scholar
  5. B. Bollobas and O. Riordan. Handbook of Graphs and Networks: Mathematical results on scale-free graphs, pages 1--32. S. Bornholdt, H. Schuster (eds), Wiley-VCH, 2002.Google ScholarGoogle Scholar
  6. B. Bollobas and O. Riordan. The diameter of a scale-free random graph. Combinatorica, 24:5--34, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. B. Bollobas, O. Riordan, J. Spencer, and G. Tusnady. The degree sequence of a scale-free random graph process.Random Structures and Algorithms, 18:279--290, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Brautbar and M. Kearns. Local algorithms for finding interesting individuals in large networks. In Proceedings of ICS 2010, pages 188--199, 2010.Google ScholarGoogle Scholar
  9. A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener. Graph structure in the web: experiments and models. In Proceedings of the Ninth International World-Wide Web Conference WWW9, Amsterdam 2000. http://www.www9.org/w9cdrom/160/160.html. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. Cooper. Classifying special interest groups in web graphs. In Proc. RANDOM 2002: Randomization and Approximation Techniques in Computer Science, pages 263--275, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. Cooper. The age specific degree distribution of web-graphs. Combinatorics Probability and Computing, 15:637--661, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Cooper and A. Frieze. A general model web graphs. In Random Structures and Algorithms, vol. 22(3), pages 311--335, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. C. Cooper and A. Frieze. The cover time of the preferential attachment graphs.Journal of Combinatorial Theory, B(97):269--290, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Dill, R. Kumar, K. S. Mccurley, S. Rajagopalan, D. Sivakumar, and A. Tomkins. Self-similarity in the web. ACM Trans. Internet Technol., 2:205--223, August 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. E. E. Drinea and M. Mitzenmacher. Variations on random graph models for the web. Tech. report, Harvard University, Dept. of Computer Science, 2001.Google ScholarGoogle Scholar
  16. M. Faloutsos, P. Faloutsos, and C. Faloutsos. On power-law relationships of the Internet topology. In Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication, SIGCOMM '99, pages 251--262, New York, NY, USA, 1999. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. D. Flaxman and J. Vera. Bias reduction in traceroute sampling - towards a more accurate map of the Internet. In Proceedings of WAW 2007, pages 1--15, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Gjoka, M. Kurant, C. T. Butts, and A. Markopoulou. A walk in Facebook: Uniform sampling of users in online social networks. CoRR, abs/0906.0060, 2009.Google ScholarGoogle Scholar
  19. S. Ikeda, I. Kubo, N. Okumoto, and M. Yamashita. Impact of Local Topological Information on Random Walks on Finite Graphs. In Proceedings of ICALP 2003, pages 1054--1067. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. M. Kleinberg, R. Kumar, P. Raghavan, S. Rajagopalan, and A. S. Tomkins. The Web as a graph: measurements, models, and methods. In Proceedings of the 5th Annual International Conference on Computing and Combinatorics, COCOON'99, pages 1--17, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Leskovec and C. Faloutsos. Sampling from large graphs. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 631--636, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. L. Lovasz. Random walks on graphs: A survey.Bolyai Society Mathematical Studies, 2:353--397, 1996.Google ScholarGoogle Scholar
  23. R. Albert and A.-L. Barabasi. Statistical mechanics of complex networks. InReviews Of Modern Physics, vol. 74, pages 47--97, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  24. A. H. Rasti, M. Torkjazi, R. Rejaie, N. Duffield, W. Willinger, and D. Stutzbach. Evaluating sampling techniques for large dynamic graphs. Technical Report CIS-TR-08-01, Department of Computer and Information Science, University of Oregon, September 2008.Google ScholarGoogle Scholar
  25. S. Redner. How popular is your paper? An empirical study of the citation distribution. In European Physical Journal Bvol. 4(2), pages 131--134, 1998.Google ScholarGoogle Scholar
  26. D. Stutzbach, R. Rejaie, N.G. Duffield, S. Sen and W. Willinger. On unbiased sampling for unstructured peer-to-peer networks. In Proceedings of the 6th ACM SIGCOMM Conference on Internet Measurement - IMC 2006, pages 27--40, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A fast algorithm to find all high degree vertices in power law graphs

                  Recommendations

                  Comments

                  Login options

                  Check if you have access through your login credentials or your institution to get full access on this article.

                  Sign in
                  • Published in

                    cover image ACM Other conferences
                    WWW '12 Companion: Proceedings of the 21st International Conference on World Wide Web
                    April 2012
                    1250 pages
                    ISBN:9781450312301
                    DOI:10.1145/2187980

                    Copyright © 2012 ACM

                    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                    Publisher

                    Association for Computing Machinery

                    New York, NY, United States

                    Publication History

                    • Published: 16 April 2012

                    Permissions

                    Request permissions about this article.

                    Request Permissions

                    Check for updates

                    Qualifiers

                    • tutorial

                    Acceptance Rates

                    Overall Acceptance Rate1,899of8,196submissions,23%

                  PDF Format

                  View or Download as a PDF file.

                  PDF

                  eReader

                  View online with eReader.

                  eReader