skip to main content
research-article

Aggregate Characterization of User Behavior in Twitter and Analysis of the Retweet Graph

Published:12 March 2015Publication History
Skip Abstract Section

Abstract

Most previous analysis of Twitter user behavior has focused on individual information cascades and the social followers graph, in which the nodes for two users are connected if one follows the other. We instead study aggregate user behavior and the retweet graph with a focus on quantitative descriptions. We find that the lifetime tweet distribution is a type-II discrete Weibull stemming from a power law hazard function, that the tweet rate distribution, although asymptotically power law, exhibits a lognormal cutoff over finite sample intervals, and that the inter-tweet interval distribution is a power law with exponential cutoff. The retweet graph is small-world and scale-free, like the social graph, but less disassortative and has much stronger clustering. These differences are consistent with it better capturing the real-world social relationships of and trust between users than the social graph. Beyond just understanding and modeling human communication patterns and social networks, applications for alternative, decentralized microblogging systems---both predicting real-word performance and detecting spam---are discussed.

References

  1. 1AM. 2013. Censorship-resistant microblogging. http://1am-networks.org.Google ScholarGoogle Scholar
  2. Albert-László Barabási and Réka Albert. 1999. Emergence of scaling in random networks. Science 286, 5439, 590--512.Google ScholarGoogle Scholar
  3. Albert-László Barabási, Hawoong Jeong, Zoltan Néda, Erzsebet Ravasz, Andras Schubert, and Tamas Vicsek. 2002. Evolution of the social network of scientific collaborations. Physica A Statist. Mech. Appl. 311, 3--4, 590--614.Google ScholarGoogle ScholarCross RefCross Ref
  4. Albert-László Barabási and Joao Gama Oliveira. 2005. Human dynamics: Darwin and Einstein correspondence patterns. Nature 437, 7063, 1251.Google ScholarGoogle Scholar
  5. Christian Bauckhage, Kristian Kersting, and Bashir Rastegarpanah. 2013. The Weibull as a model of shortest path distributions in random networks. In Proceeding of the Workshop on Mining and Learning with Graphs (MLG’13). 1--6.Google ScholarGoogle Scholar
  6. Fabrício Benevenuto, Gabriel Magno, Tiago Rodrigues, and VirgílioAlmeida. 2010. Detecting spammers on Twitter. In Proceedings of the Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference (CEAS’10). 1--9.Google ScholarGoogle Scholar
  7. Catherine A. Bliss, Isabel M. Kloumann, Kameron Decker Harrison, Christopher M. Danforth, and Peter Sheridan Dodds. 2012. Twitter reciprocal reply networks exhibit assortativity with respect to happiness. J. Comput. Sci. 3, 388--397.Google ScholarGoogle ScholarCross RefCross Ref
  8. Béla Bollobás, Christian Borgs, Jennifer Chayes, and Oliver Riordan.2003. Directed scale-free graphs. In Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’03). 132--139. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Sean Borman. 2009. The expectation maximization algorithm: A short tutorial. http://www.seanborman.com/publications/EM_algorithm.pdf.Google ScholarGoogle Scholar
  10. Lawrence Brown, Noah Gans, Avishai Mandelbaum, Anat Sakov, Haipeng Shen,Sergey Zeltyn, and Linda Zhao. 2005. Statistical analysis of a telephone call center. J. Amer. Statist. Assoc. 100, 469, 36--50.Google ScholarGoogle ScholarCross RefCross Ref
  11. Julián Candia, Marta C. González, Pu Wang, Timothy Schoenharl, Greg Madey, and Albert-Laszló Barabási. 2008. Uncovering individual and collective human dynamics from mobile phone records. J. Physica A Math. Theoret. 41, 22, 224015.Google ScholarGoogle ScholarCross RefCross Ref
  12. Meeyoung Cha, Alan Mislove, and Krishna P. Gummadi. 2009. A measurement-driven analysis of information propagation in the Flickr social network. In Proceedings of the 18th International World Wide Web Conference (WWW’09). 721--730. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Deepayan Chakrabarti, Yiping Zhan, and Christos Faloutsos. 2004. R-MAT: A recursive model for graph mining. In Proceedings of the International Conference on Data Mining (ICDM’04). 442--446.Google ScholarGoogle ScholarCross RefCross Ref
  14. Xiaoling Chen, Rajarathnam Chandramouli, and Koduvayur P. Subbalakshmi. 2011. Scam detection in Twitter. In Proceedings of the SIAM Text Mining Workshop (SIAM’11). 1--10.Google ScholarGoogle Scholar
  15. Aaron Clauset, Cosma Rohilla Shalizi, and Mark E. J. Newman. 2009. Power-law distributions in empirical data. SIAM Rev. 51, 4, 661--703. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Arthur P. Dempster, Nan M. Laird, and Donald B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statist. Soc. B39, 1, 1--38.Google ScholarGoogle Scholar
  17. Sergey N. Dorogovtsev and Jose F. F. Mendes. 2000. Scaling behavior of developing and decaying networks. Europhys. Lett. 52, 33--39.Google ScholarGoogle ScholarCross RefCross Ref
  18. Sergey N. Dorogovtsev and Jose F. F. Mendes. 2001. Language as an evolving word web. Proc. Royal Soc. London B268, 1485, 2603--2606.Google ScholarGoogle Scholar
  19. Sergey N. Dorogovtsev and Jose F. F. Mendes. 2002. Evolution of networks. Adv. Phys. 51, 4, 1079--1187.Google ScholarGoogle ScholarCross RefCross Ref
  20. Nick Duffield, Carsten Lund, and Mikkel Thorup. 2005. Estimating flow distributions from sampled flow statistics. IEEE Trans. Netw. 13, 5, 933--946. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Giorgio Fagiolo. 2007. Clustering in complex directed networks. APS Phys. Rev. E76, 2, 26--107.Google ScholarGoogle Scholar
  22. Jacob G. Foster, David V. Foster, Peter Grassberger, and Maya Paczuski. 2010. Edge direction and the structure of networks. Proc. Nat. Acad. Sci. United States Amer. 107, 24, 10815--10820.Google ScholarGoogle ScholarCross RefCross Ref
  23. Miguel Freitas. 2013. Twister: Peer-to-peer microblogging. http://twister.net.co/.Google ScholarGoogle Scholar
  24. Maksym Gabielkov and Arnaud Legout. 2012. The complete picture of the Twitter social graph. In Proceedings of the International Conference on Emerging Networking Experiments and Technologies Student Workshop (CoNEXTStudent’12). 19--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Wojciech Galuba, Karl Aberer, Dipanjan Chakraborty, Zoran Despotovic, and Wolfgang Kellerer. 2010. Outtweeting the twitterers---Predicting information cascades in microblogs. In Proceedings of the 3rd Workshop on Online Social Networks (WOSN’10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Saptarshi Ghosh, Ajitesh Srivastava, and Niloy Ganguly. 2012. Effects of a soft cut-off on node-degree in the Twitter social network. Comput. Comm. 35, 7, 784--795. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Kwang-Il Goh and Albert-Lásló Barabási. 2008. Burstiness and memory in complex systems. Europhys. Lett. 81, 4.Google ScholarGoogle ScholarCross RefCross Ref
  28. Leo A. Goodman. 1961. Snowball sampling. Annals Math. Statist. 32, 1, 148--170.Google ScholarGoogle ScholarCross RefCross Ref
  29. Uli Harder and Maya Paczuski. 2006. Correlated dynamics in human printing behavior. Physica A Statist. Mech. Appl. 361, 1, 329--336.Google ScholarGoogle ScholarCross RefCross Ref
  30. Hai-Bo Hu and Xiao-Fan Wong. 2009. Disassortative mixing in online social networks. Europhys. Lett. 86, 1.Google ScholarGoogle ScholarCross RefCross Ref
  31. Bernardo A. Huberman, Daniel M. Romero, and Fang Wu. 2009. Crowdsourcing, attention and productivity. J. Inf. Sci. 35, 6, 758--765. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Akshay Java, Xiaodan Song, Tim Finin, and Belle Tseng. 2007. Why we Twitter: Understanding microblogging usage and communities. In Proceedings of the 9th WebKDD and 1st SNA-KDD Workshop on Web Mining and Social Network Analysis (WebKDD/SNA-KDD’07). 56--65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Normal L. Johnson, Adrienne W. Kemp, and Samuel Kotz. 2005. Univariate Discrete Distributions, 3rd Ed. John Wiley and Sons.Google ScholarGoogle Scholar
  34. Marcus Kaiser. 2008. Mean clustering coefficients: The role of isolated nodes and leafs on clustering measures for small-world networks. New J. Phys. 10, 8.Google ScholarGoogle ScholarCross RefCross Ref
  35. Maurice George Kendall. 1938. A new measure of rank correlation. Biometrika 30, 1--2, 81--93.Google ScholarGoogle ScholarCross RefCross Ref
  36. Ravi Kumar, Jasmine Novak, and Andrew Tomkins. 2006. Structure and evolution of online social networks. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’06). 611--617. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a social network or a news media? In Proceedings of the 19th International Conference on World Wide Web (WWW’10). 591--600. http://an.kaist.ac.kr/traces/WWW2010.html. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Sang Hoon Lee, Pan-Jun Kim, and Hawoong Jeong. 2006. Statistical properties of sampled networks. APS Phys. Rev. E73, 1.Google ScholarGoogle Scholar
  39. Jure Leskovec and Eric Horvitz. 2008. Planetary-scale views on a large instant-messaging network. In Proceedings of the International Conference on World Wide Web (WWW’08). 915--924. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Nelly Litvak and Remco Van Der Hofstad. 2013. Uncovering disassortativity in large scale-free networks. APS Phys. Rev. E87, 2.Google ScholarGoogle Scholar
  41. Gilad Lotan, Erhardt Graeff, Mike Ananny, Devin Gaffney, Ian Pearce, and Danah Boyd. 2011. The revolutions were tweeted: Information flows during the 2011 Tunisian and Egyptian revolutions. Int. J. Comm. 5, 1375--1405.Google ScholarGoogle Scholar
  42. Alfred J. Lotka. 1926. The frequency distribution of scientific productivity. J. Washington Acad. Sci. 16, 12, 317--324.Google ScholarGoogle Scholar
  43. Michael Mccord and Mooi C. Chuah. 2011. Spam detection on Twitter using traditional classifiers. In Proceedings of the 8th International Conference on Autonomic and Trusted Computing (ATC’11). 175--186. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Geoffrey J. Mclachlan and Thriyambakam Krishnan. 2008. The EM Algorithm and Extensions, 2nd Ed. John Wiley and Sons.Google ScholarGoogle Scholar
  45. Stanley Milgram. 1967. The small-world problem. Psychol. Today 1, 1, 61--67.Google ScholarGoogle Scholar
  46. Staša Milojevic. 2010. Power-law distributions in information science---Making the case for logarithmic binning. J. Amer. Soc. Inf. Sci. Technol. 61, 12, 2417--2425. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Toshio Nakagawa and Shunji Osaki. 1975. The discrete Weibull distribution. IEEE Trans. Reliab. R-24, 5, 300--301.Google ScholarGoogle ScholarCross RefCross Ref
  48. Mark E. J. Newman. 2002. Assortative mixing in networks. Phys. Rev. Lett. 89, 20.Google ScholarGoogle ScholarCross RefCross Ref
  49. Christopher R. Palmer, Georgos Siganos, Michalis Faloutsos, Christos Faloutsos, and Phillip B. Gibbons. 2001. The connectivity and fault-tolerance of the internet topology. In Proceedings of the Workshop on Network-Related Data Management (NRDM’01). 1--6.Google ScholarGoogle Scholar
  50. William J. Reed and Murray Jorgensen. 2004. The double Pareto-lognormal distribution---A new parametric model for size distributions. Comm. Statist. Theory Methods 33, 8,1733--1753.Google ScholarGoogle ScholarCross RefCross Ref
  51. Pramod J. Sadalage and Martin Fowler. 2012. NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence, 1st Ed. Addison-Wesley Professional. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Daniel R. Sandler and Dan S. Wallach. 2009. Birds of a FETHR: Open, decentralized micropublishing. In Proceedings of the 8th International Conference on Peer-to-Peer Systems (IPTPS’09). 1--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Mukund Seshadri, Sridhar Machiraju, Ashwin Sridharan, Jean Bolot, ChristosFaloutsos, and Jure Leskovec. 2008. Mobile call graphs: Beyond power-law and lognormal distributions. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’08). 596--604. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Se.-W. Son, Claire Christensen, Golnoosh Bizhani, David V. Foster, Peter Grassberger, and Maya Paczuski. 2012. Sampling properties of directed networks. APS Phys. Rev. E86, 4.Google ScholarGoogle Scholar
  55. Jonghyuk Song, Sangho Lee, and Jong Kim. 2011. Spam filtering in Twitter using sender-receiver relationship. In Proceedings of the 14th International Conference on Recent Advances in Intrusion Detection (RAID’11). 301--317. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Pierre St. Juste, David Wolinsky, P. Oscar Boykin, and Renato J. Figueiredo. 2011. Litter: A lightweight peer-to-peer microblogging service. In Proceedings of the 3rd IEEE International Conference on Privacy, Security, Risk and Trust (PASSAT’11). 900--903.Google ScholarGoogle Scholar
  57. William E. Stein and Ronald Dattero. 1984. A new discrete Weibull distribution. IEEE Trans. Reliab. R33, 2, 196--197.Google ScholarGoogle ScholarCross RefCross Ref
  58. Michael P. H. Stumpf, Carsten Wiuf, and Robert M. May. 2005. Subnets of scale-free networks are not scale-free: Sampling properties of networks. Proc. Nat. Acad. Sci. United States Amer. 102, 12, 4221--4224.Google ScholarGoogle ScholarCross RefCross Ref
  59. Bongwon Suh, Lichan Hong, Petr Pirolli, and Ed H. Chi. 2010. Want to be retweeted? Large scale analytics on factors impacting retweet in Twitter network. In Proceedings of the 2nd IEEE International Conference on Social Computing (SOCIALCOM’10). 177--184. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Ole Tange. 2011. GNU parallel---The command-line power tool. Login: The USENIX Mag. 36, 1, 42--47. http://www.gnu.org/s/parallel.Google ScholarGoogle Scholar
  61. Abraham Ronel Martínez Teutle. 2010. Twitter: Network properties analysis. In Proceedings of the International Conference on Electronics, Communications, and Computer (CONIELECOMP’10). 180--186.Google ScholarGoogle Scholar
  62. Kurt Thomas, Chris Grier, and Vern Paxson. 2012. Adapting social spam infrastructure for political censorship. In Proceedings of the 5th Workshop on Large-Scale Exploits and Emergent Threats (LEET’12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Kurt Thomas, Chris Grier, Dawn Song, and Vern Paxson. 2011. Suspended accounts in retrospect: An analysis of Twitter spam. In Proceedings of the ACM SIGCOMM Internet Measurement Conference (IMC’11). 243--256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Jeffrey Travers and Stanley Milgram. 1969. An experimental study of the small world problem. Sociometry 32, 4, 425--443.Google ScholarGoogle ScholarCross RefCross Ref
  65. Alex Hai Wang. 2010. Don’t follow me: Spam detection in Twitter. In Proceedings of the International Conference on Security and Cryptography (SECRYPT’10). 1--10.Google ScholarGoogle Scholar
  66. Audrey Watters. 2011. How recent changes to Twitter’s terms of service might hurt academic research. http://webcitation.org/6MgAFaaMi. http://readwrite.com/2011/03/03/how_recent_changes_to_twitters_terms_of_service_mi.Google ScholarGoogle Scholar
  67. Duncan J. Watts and Steven H. Strogatz. 1998. Collective dynamics of ‘small-world’ networks. Nature 393, 6684, 440--442.Google ScholarGoogle Scholar
  68. Dennis M. Wilkinson. 2008. Strong regularities in online peer production. In Proceedings of the 9th Conference on Electronic Commerce (EC’08). 302--309. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Shaomei Wu, Jake M. Hofman, Winter A. Mason, and Duncan J. Watts. 2011. Who says what to whom on Twitter. In Proceedings of the International Conference on World Wide Web (WWW’11). 705--714. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Tianyin Xu, Yang Chen, Jin Zhao, and Xiaoming Fu. 2010. Cuckoo: Towards decentralized, socio-aware online microblogging services and data measurements. In Proceedings of the 2nd ACM International Workshop on Hot Topics in Planet-Scale Measurement (HotPlanet’10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Chao Yang, Robert Harkreader, Jialong Zhang, Seungwon Shin, and Guofei Gu. 2012. Analyzing spammer’s social networks for fun and profit: A case study of cyber criminal ecosystem on Twitter. In Proceedings of the 21st International Conference on World Wide Web (WWW’12). 71--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Chao Yang, Robert Chandler Harkreader, and Guofei Gu. 2011. Die free or live hard? Empirical evaluation and new design for fighting evolving Twitter spammers. In Proceedings of the 14th International Conference on Recent Advances in Intrusion Detection (RAID’11). 318--337. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Jaewon Yang and Jure Leskovec. 2011. Patterns of temporal variation in online media. In Proceedings of the 4th ACM International Conference on Web Search and Data Mining (WSDM’11). 177--186. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Haifeng Yu, Phillip B. Gibbons, Michael Kaminsky, and Feng Xiao. 2008a. SybilLimit: A near-optimal social network defense against Sybil attacks. In Proceedings of the IEEE Symposium on Security and Privacy (SP’08). 3--17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Haifeng Yu, Michael Kaminsky, Phillip B. Gibbons, and Abraham Flaxman. 2008b. SybilGuard: Defending against Sybil attacks via social networks. IEEE Trans. Netw. 16, 3, 576--589. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Aggregate Characterization of User Behavior in Twitter and Analysis of the Retweet Graph

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Internet Technology
        ACM Transactions on Internet Technology  Volume 15, Issue 1
        Special Issue on Foundations of Social Computing
        February 2015
        147 pages
        ISSN:1533-5399
        EISSN:1557-6051
        DOI:10.1145/2745838
        • Editor:
        • Munindar P. Singh
        Issue’s Table of Contents

        Copyright © 2015 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 12 March 2015
        • Accepted: 1 November 2014
        • Revised: 1 July 2014
        • Received: 1 February 2014
        Published in toit Volume 15, Issue 1

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader