research-article

Aggregate Characterization of User Behavior in Twitter and Analysis of the Retweet Graph

Authors:
David R. Bild

University of Michigan

University of Michigan
View Profile

,
Yue Liu

University of Michigan

University of Michigan
View Profile

,
Robert P. Dick

University of Michigan

University of Michigan
View Profile

,
Z. Morley Mao

University of Michigan

University of Michigan
View Profile

,
Dan S. Wallach

Rice University

Rice University
View Profile

Authors Info & Claims

ACM Transactions on Internet Technology Volume 15 Issue 1Article No.: 4pp 1–24https://doi.org/10.1145/2700060

Published:12 March 2015Publication History

ACM Transactions on Internet Technology

Abstract

Most previous analysis of Twitter user behavior has focused on individual information cascades and the social followers graph, in which the nodes for two users are connected if one follows the other. We instead study aggregate user behavior and the retweet graph with a focus on quantitative descriptions. We find that the lifetime tweet distribution is a type-II discrete Weibull stemming from a power law hazard function, that the tweet rate distribution, although asymptotically power law, exhibits a lognormal cutoff over finite sample intervals, and that the inter-tweet interval distribution is a power law with exponential cutoff. The retweet graph is small-world and scale-free, like the social graph, but less disassortative and has much stronger clustering. These differences are consistent with it better capturing the real-world social relationships of and trust between users than the social graph. Beyond just understanding and modeling human communication patterns and social networks, applications for alternative, decentralized microblogging systems---both predicting real-word performance and detecting spam---are discussed.

References

1AM. 2013. Censorship-resistant microblogging. http://1am-networks.org.Google Scholar
Albert-László Barabási and Réka Albert. 1999. Emergence of scaling in random networks. Science 286, 5439, 590--512.Google Scholar
Albert-László Barabási, Hawoong Jeong, Zoltan Néda, Erzsebet Ravasz, Andras Schubert, and Tamas Vicsek. 2002. Evolution of the social network of scientific collaborations. Physica A Statist. Mech. Appl. 311, 3--4, 590--614.Google ScholarCross Ref
Albert-László Barabási and Joao Gama Oliveira. 2005. Human dynamics: Darwin and Einstein correspondence patterns. Nature 437, 7063, 1251.Google Scholar
Christian Bauckhage, Kristian Kersting, and Bashir Rastegarpanah. 2013. The Weibull as a model of shortest path distributions in random networks. In Proceeding of the Workshop on Mining and Learning with Graphs (MLG’13). 1--6.Google Scholar
Fabrício Benevenuto, Gabriel Magno, Tiago Rodrigues, and VirgílioAlmeida. 2010. Detecting spammers on Twitter. In Proceedings of the Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference (CEAS’10). 1--9.Google Scholar
Catherine A. Bliss, Isabel M. Kloumann, Kameron Decker Harrison, Christopher M. Danforth, and Peter Sheridan Dodds. 2012. Twitter reciprocal reply networks exhibit assortativity with respect to happiness. J. Comput. Sci. 3, 388--397.Google ScholarCross Ref
Béla Bollobás, Christian Borgs, Jennifer Chayes, and Oliver Riordan.2003. Directed scale-free graphs. In Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’03). 132--139. Google ScholarDigital Library
Sean Borman. 2009. The expectation maximization algorithm: A short tutorial. http://www.seanborman.com/publications/EM_algorithm.pdf.Google Scholar
Lawrence Brown, Noah Gans, Avishai Mandelbaum, Anat Sakov, Haipeng Shen,Sergey Zeltyn, and Linda Zhao. 2005. Statistical analysis of a telephone call center. J. Amer. Statist. Assoc. 100, 469, 36--50.Google ScholarCross Ref
Julián Candia, Marta C. González, Pu Wang, Timothy Schoenharl, Greg Madey, and Albert-Laszló Barabási. 2008. Uncovering individual and collective human dynamics from mobile phone records. J. Physica A Math. Theoret. 41, 22, 224015.Google ScholarCross Ref
Meeyoung Cha, Alan Mislove, and Krishna P. Gummadi. 2009. A measurement-driven analysis of information propagation in the Flickr social network. In Proceedings of the 18th International World Wide Web Conference (WWW’09). 721--730. Google ScholarDigital Library
Deepayan Chakrabarti, Yiping Zhan, and Christos Faloutsos. 2004. R-MAT: A recursive model for graph mining. In Proceedings of the International Conference on Data Mining (ICDM’04). 442--446.Google ScholarCross Ref
Xiaoling Chen, Rajarathnam Chandramouli, and Koduvayur P. Subbalakshmi. 2011. Scam detection in Twitter. In Proceedings of the SIAM Text Mining Workshop (SIAM’11). 1--10.Google Scholar
Aaron Clauset, Cosma Rohilla Shalizi, and Mark E. J. Newman. 2009. Power-law distributions in empirical data. SIAM Rev. 51, 4, 661--703. Google ScholarDigital Library
Arthur P. Dempster, Nan M. Laird, and Donald B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statist. Soc. B39, 1, 1--38.Google Scholar
Sergey N. Dorogovtsev and Jose F. F. Mendes. 2000. Scaling behavior of developing and decaying networks. Europhys. Lett. 52, 33--39.Google ScholarCross Ref
Sergey N. Dorogovtsev and Jose F. F. Mendes. 2001. Language as an evolving word web. Proc. Royal Soc. London B268, 1485, 2603--2606.Google Scholar
Sergey N. Dorogovtsev and Jose F. F. Mendes. 2002. Evolution of networks. Adv. Phys. 51, 4, 1079--1187.Google ScholarCross Ref
Nick Duffield, Carsten Lund, and Mikkel Thorup. 2005. Estimating flow distributions from sampled flow statistics. IEEE Trans. Netw. 13, 5, 933--946. Google ScholarDigital Library
Giorgio Fagiolo. 2007. Clustering in complex directed networks. APS Phys. Rev. E76, 2, 26--107.Google Scholar
Jacob G. Foster, David V. Foster, Peter Grassberger, and Maya Paczuski. 2010. Edge direction and the structure of networks. Proc. Nat. Acad. Sci. United States Amer. 107, 24, 10815--10820.Google ScholarCross Ref
Miguel Freitas. 2013. Twister: Peer-to-peer microblogging. http://twister.net.co/.Google Scholar
Maksym Gabielkov and Arnaud Legout. 2012. The complete picture of the Twitter social graph. In Proceedings of the International Conference on Emerging Networking Experiments and Technologies Student Workshop (CoNEXTStudent’12). 19--20. Google ScholarDigital Library
Wojciech Galuba, Karl Aberer, Dipanjan Chakraborty, Zoran Despotovic, and Wolfgang Kellerer. 2010. Outtweeting the twitterers---Predicting information cascades in microblogs. In Proceedings of the 3rd Workshop on Online Social Networks (WOSN’10). Google ScholarDigital Library
Saptarshi Ghosh, Ajitesh Srivastava, and Niloy Ganguly. 2012. Effects of a soft cut-off on node-degree in the Twitter social network. Comput. Comm. 35, 7, 784--795. Google ScholarDigital Library
Kwang-Il Goh and Albert-Lásló Barabási. 2008. Burstiness and memory in complex systems. Europhys. Lett. 81, 4.Google ScholarCross Ref
Leo A. Goodman. 1961. Snowball sampling. Annals Math. Statist. 32, 1, 148--170.Google ScholarCross Ref
Uli Harder and Maya Paczuski. 2006. Correlated dynamics in human printing behavior. Physica A Statist. Mech. Appl. 361, 1, 329--336.Google ScholarCross Ref
Hai-Bo Hu and Xiao-Fan Wong. 2009. Disassortative mixing in online social networks. Europhys. Lett. 86, 1.Google ScholarCross Ref
Bernardo A. Huberman, Daniel M. Romero, and Fang Wu. 2009. Crowdsourcing, attention and productivity. J. Inf. Sci. 35, 6, 758--765. Google ScholarDigital Library
Akshay Java, Xiaodan Song, Tim Finin, and Belle Tseng. 2007. Why we Twitter: Understanding microblogging usage and communities. In Proceedings of the 9th WebKDD and 1st SNA-KDD Workshop on Web Mining and Social Network Analysis (WebKDD/SNA-KDD’07). 56--65. Google ScholarDigital Library
Normal L. Johnson, Adrienne W. Kemp, and Samuel Kotz. 2005. Univariate Discrete Distributions, 3rd Ed. John Wiley and Sons.Google Scholar
Marcus Kaiser. 2008. Mean clustering coefficients: The role of isolated nodes and leafs on clustering measures for small-world networks. New J. Phys. 10, 8.Google ScholarCross Ref
Maurice George Kendall. 1938. A new measure of rank correlation. Biometrika 30, 1--2, 81--93.Google ScholarCross Ref
Ravi Kumar, Jasmine Novak, and Andrew Tomkins. 2006. Structure and evolution of online social networks. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’06). 611--617. Google ScholarDigital Library
Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a social network or a news media? In Proceedings of the 19th International Conference on World Wide Web (WWW’10). 591--600. http://an.kaist.ac.kr/traces/WWW2010.html. Google ScholarDigital Library
Sang Hoon Lee, Pan-Jun Kim, and Hawoong Jeong. 2006. Statistical properties of sampled networks. APS Phys. Rev. E73, 1.Google Scholar
Jure Leskovec and Eric Horvitz. 2008. Planetary-scale views on a large instant-messaging network. In Proceedings of the International Conference on World Wide Web (WWW’08). 915--924. Google ScholarDigital Library
Nelly Litvak and Remco Van Der Hofstad. 2013. Uncovering disassortativity in large scale-free networks. APS Phys. Rev. E87, 2.Google Scholar
Gilad Lotan, Erhardt Graeff, Mike Ananny, Devin Gaffney, Ian Pearce, and Danah Boyd. 2011. The revolutions were tweeted: Information flows during the 2011 Tunisian and Egyptian revolutions. Int. J. Comm. 5, 1375--1405.Google Scholar
Alfred J. Lotka. 1926. The frequency distribution of scientific productivity. J. Washington Acad. Sci. 16, 12, 317--324.Google Scholar
Michael Mccord and Mooi C. Chuah. 2011. Spam detection on Twitter using traditional classifiers. In Proceedings of the 8th International Conference on Autonomic and Trusted Computing (ATC’11). 175--186. Google ScholarDigital Library
Geoffrey J. Mclachlan and Thriyambakam Krishnan. 2008. The EM Algorithm and Extensions, 2nd Ed. John Wiley and Sons.Google Scholar
Stanley Milgram. 1967. The small-world problem. Psychol. Today 1, 1, 61--67.Google Scholar
Staša Milojevic. 2010. Power-law distributions in information science---Making the case for logarithmic binning. J. Amer. Soc. Inf. Sci. Technol. 61, 12, 2417--2425. Google ScholarDigital Library
Toshio Nakagawa and Shunji Osaki. 1975. The discrete Weibull distribution. IEEE Trans. Reliab. R-24, 5, 300--301.Google ScholarCross Ref
Mark E. J. Newman. 2002. Assortative mixing in networks. Phys. Rev. Lett. 89, 20.Google ScholarCross Ref
Christopher R. Palmer, Georgos Siganos, Michalis Faloutsos, Christos Faloutsos, and Phillip B. Gibbons. 2001. The connectivity and fault-tolerance of the internet topology. In Proceedings of the Workshop on Network-Related Data Management (NRDM’01). 1--6.Google Scholar
William J. Reed and Murray Jorgensen. 2004. The double Pareto-lognormal distribution---A new parametric model for size distributions. Comm. Statist. Theory Methods 33, 8,1733--1753.Google ScholarCross Ref
Pramod J. Sadalage and Martin Fowler. 2012. NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence, 1st Ed. Addison-Wesley Professional. Google ScholarDigital Library
Daniel R. Sandler and Dan S. Wallach. 2009. Birds of a FETHR: Open, decentralized micropublishing. In Proceedings of the 8th International Conference on Peer-to-Peer Systems (IPTPS’09). 1--6. Google ScholarDigital Library
Mukund Seshadri, Sridhar Machiraju, Ashwin Sridharan, Jean Bolot, ChristosFaloutsos, and Jure Leskovec. 2008. Mobile call graphs: Beyond power-law and lognormal distributions. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’08). 596--604. Google ScholarDigital Library
Se.-W. Son, Claire Christensen, Golnoosh Bizhani, David V. Foster, Peter Grassberger, and Maya Paczuski. 2012. Sampling properties of directed networks. APS Phys. Rev. E86, 4.Google Scholar
Jonghyuk Song, Sangho Lee, and Jong Kim. 2011. Spam filtering in Twitter using sender-receiver relationship. In Proceedings of the 14th International Conference on Recent Advances in Intrusion Detection (RAID’11). 301--317. Google ScholarDigital Library
Pierre St. Juste, David Wolinsky, P. Oscar Boykin, and Renato J. Figueiredo. 2011. Litter: A lightweight peer-to-peer microblogging service. In Proceedings of the 3rd IEEE International Conference on Privacy, Security, Risk and Trust (PASSAT’11). 900--903.Google Scholar
William E. Stein and Ronald Dattero. 1984. A new discrete Weibull distribution. IEEE Trans. Reliab. R33, 2, 196--197.Google ScholarCross Ref
Michael P. H. Stumpf, Carsten Wiuf, and Robert M. May. 2005. Subnets of scale-free networks are not scale-free: Sampling properties of networks. Proc. Nat. Acad. Sci. United States Amer. 102, 12, 4221--4224.Google ScholarCross Ref
Bongwon Suh, Lichan Hong, Petr Pirolli, and Ed H. Chi. 2010. Want to be retweeted? Large scale analytics on factors impacting retweet in Twitter network. In Proceedings of the 2nd IEEE International Conference on Social Computing (SOCIALCOM’10). 177--184. Google ScholarDigital Library
Ole Tange. 2011. GNU parallel---The command-line power tool. Login: The USENIX Mag. 36, 1, 42--47. http://www.gnu.org/s/parallel.Google Scholar
Abraham Ronel Martínez Teutle. 2010. Twitter: Network properties analysis. In Proceedings of the International Conference on Electronics, Communications, and Computer (CONIELECOMP’10). 180--186.Google Scholar
Kurt Thomas, Chris Grier, and Vern Paxson. 2012. Adapting social spam infrastructure for political censorship. In Proceedings of the 5th Workshop on Large-Scale Exploits and Emergent Threats (LEET’12). Google ScholarDigital Library
Kurt Thomas, Chris Grier, Dawn Song, and Vern Paxson. 2011. Suspended accounts in retrospect: An analysis of Twitter spam. In Proceedings of the ACM SIGCOMM Internet Measurement Conference (IMC’11). 243--256. Google ScholarDigital Library
Jeffrey Travers and Stanley Milgram. 1969. An experimental study of the small world problem. Sociometry 32, 4, 425--443.Google ScholarCross Ref
Alex Hai Wang. 2010. Don’t follow me: Spam detection in Twitter. In Proceedings of the International Conference on Security and Cryptography (SECRYPT’10). 1--10.Google Scholar
Audrey Watters. 2011. How recent changes to Twitter’s terms of service might hurt academic research. http://webcitation.org/6MgAFaaMi. http://readwrite.com/2011/03/03/how_recent_changes_to_twitters_terms_of_service_mi.Google Scholar
Duncan J. Watts and Steven H. Strogatz. 1998. Collective dynamics of ‘small-world’ networks. Nature 393, 6684, 440--442.Google Scholar
Dennis M. Wilkinson. 2008. Strong regularities in online peer production. In Proceedings of the 9th Conference on Electronic Commerce (EC’08). 302--309. Google ScholarDigital Library
Shaomei Wu, Jake M. Hofman, Winter A. Mason, and Duncan J. Watts. 2011. Who says what to whom on Twitter. In Proceedings of the International Conference on World Wide Web (WWW’11). 705--714. Google ScholarDigital Library
Tianyin Xu, Yang Chen, Jin Zhao, and Xiaoming Fu. 2010. Cuckoo: Towards decentralized, socio-aware online microblogging services and data measurements. In Proceedings of the 2nd ACM International Workshop on Hot Topics in Planet-Scale Measurement (HotPlanet’10). Google ScholarDigital Library
Chao Yang, Robert Harkreader, Jialong Zhang, Seungwon Shin, and Guofei Gu. 2012. Analyzing spammer’s social networks for fun and profit: A case study of cyber criminal ecosystem on Twitter. In Proceedings of the 21st International Conference on World Wide Web (WWW’12). 71--80. Google ScholarDigital Library
Chao Yang, Robert Chandler Harkreader, and Guofei Gu. 2011. Die free or live hard? Empirical evaluation and new design for fighting evolving Twitter spammers. In Proceedings of the 14th International Conference on Recent Advances in Intrusion Detection (RAID’11). 318--337. Google ScholarDigital Library
Jaewon Yang and Jure Leskovec. 2011. Patterns of temporal variation in online media. In Proceedings of the 4th ACM International Conference on Web Search and Data Mining (WSDM’11). 177--186. Google ScholarDigital Library
Haifeng Yu, Phillip B. Gibbons, Michael Kaminsky, and Feng Xiao. 2008a. SybilLimit: A near-optimal social network defense against Sybil attacks. In Proceedings of the IEEE Symposium on Security and Privacy (SP’08). 3--17. Google ScholarDigital Library
Haifeng Yu, Michael Kaminsky, Phillip B. Gibbons, and Abraham Flaxman. 2008b. SybilGuard: Defending against Sybil attacks via social networks. IEEE Trans. Netw. 16, 3, 576--589. Google ScholarDigital Library

Index Terms

Aggregate Characterization of User Behavior in Twitter and Analysis of the Retweet Graph
1. Applied computing
  1. Law, social and behavioral sciences
2. Information systems
  1. Information systems applications

Recommendations

Retweet Behavior Prediction in Twitter
ISCID '14: Proceedings of the 2014 Seventh International Symposium on Computational Intelligence and Design - Volume 02

Retweet, as a main way to spread information in twitter, has been researched in a number of works. Recently research focuses on analyzing the factors of retweet behavior. However, the prediction on retweet behavior is a new challenge which is not well ...
Read More
Analyzing User Retweet Behavior on Twitter
ASONAM '12: Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)

This paper provides a deep analysis of user retweet behavior on Twitter. While previous works about analyzing retweet have mainly focused on predicting the retweetability of each tweet, they lacked interpretations at an individual level. In this paper, ...
Read More
Retweet Predictive Model in Twitter / Modelo Preditivo de Retweet no Twitter
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Internet Technology Volume 15, Issue 1
Special Issue on Foundations of Social Computing
February 2015
147 pages
ISSN:1533-5399
EISSN:1557-6051
DOI:10.1145/2745838
Editor:
Munindar P. Singh
Department of Computer Science, North Carolina State University
Issue’s Table of Contents
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 March 2015
- Accepted: 1 November 2014
- Revised: 1 July 2014
- Received: 1 February 2014
Published in toit Volume 15, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Social network analysis
decentralized network architectures
microblogging systems
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 73
  Total Citations
  View Citations
- 1,314
  Total Downloads
- Downloads (Last 12 months)62
- Downloads (Last 6 weeks)12
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Aggregate Characterization of User Behavior in Twitter and Analysis of the Retweet Graph

ACM Transactions on Internet Technology

Abstract

References

Cited By

Index Terms

Recommendations

Retweet Behavior Prediction in Twitter

Analyzing User Retweet Behavior on Twitter

Retweet Predictive Model in Twitter / Modelo Preditivo de Retweet no Twitter