skip to main content
10.1145/2000172.2000178acmconferencesArticle/Chapter ViewAbstractPublication PagesmobisysConference Proceedingsconference-collections
research-article

Albatross sampling: robust and effective hybrid vertex sampling for social graphs

Published:28 June 2011Publication History

ABSTRACT

Nowadays, Online Social Networks (OSNs) have become dramatically popular and the study of social graphs attracts the interests of a large number of researchers. One critical challenge is the huge size of the social graph, which makes the graph analyzing or even the data crawling incredibly time consuming, and sometimes impossible to be completed. Thus, graph sampling algorithms have been introduced to obtain a smaller subgraph which reflects the properties of the original graph well. Breadth-First Sampling (BFS) is widely used in graph sampling, but it is biased towards high-degree vertices during the process of sampling. Besides, Metropolis-Hasting Random Walk (MHRW), which is proposed to get unbiased samples of the social graph, requires the graph to be well connected. In this paper, we propose a vertex sampling algorithm, so-called Albatross Sampling (AS), which introduces random jump strategy into MHRW during the sampling process. The embedded random jump makes the sampling procedure more flexible and avoids being trapped in some locally well connected part. According to our evaluation, we find that no matter using tightly or loosely connected graphs, AS performs significantly better than MHRW and BFS. On the one hand, AS estimates the degree distribution with much lower Normalized Mean Square Error (NMSE) by consuming the same resource budget. On the other hand, to get an acceptable estimation of the degree distribution, AS requires much less resource budget.

References

  1. Project of Sampling Social Graphs. http://code.google.com/p/sampling-social-graphs/, 2011.Google ScholarGoogle Scholar
  2. Y.-Y. Ahn, S. Han, H. Kwak, S. Moon, and H. Jeong. Analysis of Topological Characteristics of Huge Online Social Networking Services. In Proc. of WWW, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. ASU. Social Computing Data Repository at Arizona State University. http://socialcomputing.asu.edu/pages/home.Google ScholarGoogle Scholar
  4. F. Benevenuto, T. Rodrigues, M. Cha, and V. Almeida. Characterizing User Behavior in Online Social Networks. In Proc. of ACM IMC, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. Diestel. Graph Theory. Springer-Verlag, 2010.Google ScholarGoogle Scholar
  6. N. Eagle, A. S. Pentland, and D. Lazer. Inferring friendship network structure by using mobile phone data. PNAS, 106(36):15274--15278, August 2009.Google ScholarGoogle ScholarCross RefCross Ref
  7. Geek.com. Twitter reaches 200 million users and 110 million tweets per day. http://www.geek.com/articles/news/twitter-reaches-200-million-users-and-110-million-tweets-per-day-20110120/.Google ScholarGoogle Scholar
  8. M. Gjoka. Measurement of online social networks. UC Irvine PhD Thesis, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Gjoka, M. Kurant, C. T. Butts, and A. Markopoulou. Walking in Facebook: A Case Study of Unbiased Sampling of Osns. In Proc. of IEEE INFOCOM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Gjoka, M. Kurant, C. T. Butts, and A. Markopoulou. Practical recommendations on sampling osn users by crawling the social graph. JSAC special issue on Measurement of Internet Topologies, 2011.Google ScholarGoogle Scholar
  11. H. Kwak, C. Lee, H. Park, and S. Moon. What is Twitter, a Social Network or a News Media? In Proc. of WWW, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Leskovec and C. Faloutsos. Sampling from Large Graphs. In Proc. of ACM SIGKDD, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. A. Levin, Y. Peres, and E. L. Wilmer. Markov Chains and Mixing Times. American Mathematical Society, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  14. L. Lovasz. Random walks on graphs: a survey. Combinatorics, 1993.Google ScholarGoogle Scholar
  15. N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller. Equation of state calculation by fast computing machines. J. Chem. Physics, 106(21), August 1953.Google ScholarGoogle Scholar
  16. A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and Analysis of Online Social Networks. In Proc. of ACM IMC, volume 106, October 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Newman. Assortative mixing in networks. Phys. Rev. Lett., 89, 2002.Google ScholarGoogle Scholar
  18. B. Ribeiro, W. Gauvin, B. Liu, and D. Towsley. On Myspace account spans and double Pareto-like distribution of friends. UMass Amherst Technical Report, (UM-CS-2010-001), January 2010.Google ScholarGoogle Scholar
  19. B. Ribeiro and D. Towsley. Estimating and Sampling Graphs with Multidimensional Random Walks. In Proc. of ACM IMC, November 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Stanford. Stanford large network dataset collection. http://snap.stanford.edu/data/index.html/.Google ScholarGoogle Scholar
  21. T. Wang, Y. Chen, Z. Zhang, T. Xu, L. Jin, P. Hui, B. Deng, and X. Li. Understanding graph sampling algorithms for social network analysis. In the 3rd ICDCS Workshop on Simplifying Complex Networks for Practitioners, June 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. D. J. Watts and S. Strogatz. Collective dynamics of 'small-world' networks. Nature, 393(6684), June 1998.Google ScholarGoogle Scholar
  23. C. Wilson, B. Boe, A. Sala, K. P. N. Puttaswamy, and B. Y. Zhao. User Interactions in Social Networks and their Implications. In Proc. of ACM EuroSys, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Albatross sampling: robust and effective hybrid vertex sampling for social graphs

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      HotPlanet '11: Proceedings of the 3rd ACM international workshop on MobiArch
      June 2011
      42 pages
      ISBN:9781450307420
      DOI:10.1145/2000172

      Copyright © 2011 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 28 June 2011

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate11of20submissions,55%

      Upcoming Conference

      MOBISYS '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader