skip to main content
10.1145/1988688.1988749acmotherconferencesArticle/Chapter ViewAbstractPublication PageswimsConference Proceedingsconference-collections
research-article

Crawling Facebook for social network analysis purposes

Published:25 May 2011Publication History

ABSTRACT

We describe our work in the collection and analysis of massive data describing the connections between participants to online social networks. Alternative approaches to social network data collection are defined and evaluated in practice, against the popular Facebook Web site. Thanks to our ad-hoc, privacy-compliant crawlers, two large samples, comprising millions of connections, have been collected; the data is anonymous and organized as an undirected graph. We describe a set of tools that we developed to analyze specific properties of such social-network graphs, i.e., among others, degree distribution, centrality measures, scaling laws and distribution of friendship.

References

  1. Y. Ahn, S. Han, H. Kwak, S. Moon, and H. Jeong. Analysis of topological characteristics of huge online social networking services. In Proceedings of the 16th international conference on World Wide Web, pages 835--844. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Albert. Diameter of the World Wide Web. Nature, 401(6749):130, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  3. R. Albert and A. Barabási. Statistical mechanics of complex networks. Reviews of modern physics, 74(1):47--97, 2002.Google ScholarGoogle Scholar
  4. F. Benevenuto, T. Rodrigues, M. Cha, and V. Almeida. Characterizing user behavior in online social networks. In Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference, pages 49--62. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. U. Brandes, M. Eiglsperger, I. Herman, M. Himsolt, and M. Marshall. GraphML progress report: Structural layer proposal. In Proc. 9th Intl. Symp. Graph Drawing, pages 501--512, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  6. P. Carrington, J. Scott, and S. Wasserman. Models and methods in social network analysis. Cambridge University Press, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  7. S. Catanese, P. De Meo, E. Ferrara, and G. Fiumara. Analyzing the Facebook Friendship Graph. In Proceedings of the 1st Workshop on Mining the Future Internet, pages 14--19, 2010.Google ScholarGoogle Scholar
  8. D. Chau, S. Pandit, S. Wang, and C. Faloutsos. Parallel crawling for online social networks. In Proceedings of the 16th international conference on World Wide Web, pages 1283--1284. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. E. Ferrara, G. Fiumara, and R. Baumgartner. Web Data Extraction, Applications and Techniques: A Survey. Tech. Report, 2010.Google ScholarGoogle Scholar
  10. M. Gjoka, M. Kurant, C. Butts, and A. Markopoulou. Walking in facebook: a case study of unbiased sampling of OSNs. In Proceedings of the 29th conference on Information communications, pages 2498--2506. IEEE Press, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. Gjoka, M. Sirivianos, A. Markopoulou, and X. Yang. Poking facebook: characterization of osn applications. In Proceedings of the first workshop on online social networks, pages 31--36. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Golbeck and J. Hendler. Inferring binary trust relationships in web-based social networks. ACM Transactions on Internet Technology, 6(4):497--529, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Gross and A. Acquisti. Information revelation and privacy in online social networks. In Proceedings of the 2005 ACM workshop on Privacy in the electronic society, pages 71--80. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Kleinberg. The small-world phenomenon: an algorithm perspective. In Proceedings of the thirty-second annual ACM symposium on Theory of computing, pages 163--170. ACM, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Kumar. Online Social Networks: Modeling and Mining. In Conf. on Web Search and Data Mining, page 60558, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. Kumar, J. Novak, and A. Tomkins. Structure and evolution of online social networks. Link Mining: Models, Algorithms, and Applications, pages 337--357, 2010.Google ScholarGoogle Scholar
  17. M. Kurant, A. Markopoulou, and P. Thiran. On the bias of breadth first search (bfs) and of other graph sampling techniques. In Proceedings of the 22nd International Teletraffic Congress, pages 1--8, 2010.Google ScholarGoogle Scholar
  18. J. Leskovec. Stanford Network Analysis Package (SNAP). http://snap.stanford.edu/.Google ScholarGoogle Scholar
  19. J. Leskovec. Dynamics of large networks. PhD thesis, Carnegie Mellon University, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Leskovec and C. Faloutsos. Sampling from large graphs. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 631--636. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Leskovec, J. Kleinberg, and C. Faloutsos. Graphs over time: densification laws, shrinking diameters and possible explanations. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pages 177--187. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. D. Liben-Nowell and J. Kleinberg. The link-prediction problem for social networks. Journal of the American Society for Information Science and Technology, 58(7):1019--1031, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Maia, J. Almeida, and V. Almeida. Identifying user behavior in online social networks. In Proceedings of the 1st workshop on Social network systems, pages 1--6. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. F. McCown and M. Nelson. What happens when facebook is gone? In Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries, pages 251--254. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. Milgram. The small world problem. Psychology today, 2(1):60--67, 1967.Google ScholarGoogle Scholar
  26. A. Mislove, M. Marcon, K. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and analysis of online social networks. In Proceedings of the 7th ACM SIGCOMM conference on Internet measurement, pages 29--42. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. C. Palmer and J. Steffan. Generating network topologies that obey power laws. In Global Telecommunications Conference, volume 1, pages 434--438. IEEE, 2002.Google ScholarGoogle Scholar
  28. A. Partow. General Purpose Hash Function Algorithms. http://www.partow.net/programming/hashfunctions/.Google ScholarGoogle Scholar
  29. A. Perer and B. Shneiderman. Balancing systematic and flexible exploration of social networks. IEEE Transactions on Visualization and Computer Graphics, pages 693--700, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. F. Schneider, A. Feldmann, B. Krishnamurthy, and W. Willinger. Understanding online social network usage from a network perspective. In Proceedings of the 9th SIGCOMM conference on Internet measurement conference, pages 35--48. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. S. Staab, P. Domingos, P. Mike, J. Golbeck, L. Ding, T. Finin, A. Joshi, A. Nowak, and R. Vallacher. Social networks applied. IEEE Intelligent systems, 20(1):80--93, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. Travers and S. Milgram. An experimental study of the small world problem. Sociometry, 32(4):425--443, 1969.Google ScholarGoogle ScholarCross RefCross Ref
  33. C. Wilson, B. Boe, A. Sala, K. Puttaswamy, and B. Zhao. User interactions in social networks and their implications. In Proceedings of the 4th ACM European conference on Computer systems, pages 205--218. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. S. Ye, J. Lang, and F. Wu. Crawling Online Social Graphs. In Proceedings of the 12th International Asia-Pacific Web Conference, pages 236--242. IEEE, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. W. Zachary. An information flow model for conflict and fission in small groups. Journal of Anthropological Research, 33(4):452--473, 1977.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Crawling Facebook for social network analysis purposes

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          WIMS '11: Proceedings of the International Conference on Web Intelligence, Mining and Semantics
          May 2011
          563 pages
          ISBN:9781450301480
          DOI:10.1145/1988688

          Copyright © 2011 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 25 May 2011

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Author Tags

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate140of278submissions,50%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader