Abstract
Most previous analysis of Twitter user behavior has focused on individual information cascades and the social followers graph, in which the nodes for two users are connected if one follows the other. We instead study aggregate user behavior and the retweet graph with a focus on quantitative descriptions. We find that the lifetime tweet distribution is a type-II discrete Weibull stemming from a power law hazard function, that the tweet rate distribution, although asymptotically power law, exhibits a lognormal cutoff over finite sample intervals, and that the inter-tweet interval distribution is a power law with exponential cutoff. The retweet graph is small-world and scale-free, like the social graph, but less disassortative and has much stronger clustering. These differences are consistent with it better capturing the real-world social relationships of and trust between users than the social graph. Beyond just understanding and modeling human communication patterns and social networks, applications for alternative, decentralized microblogging systems---both predicting real-word performance and detecting spam---are discussed.
- 1AM. 2013. Censorship-resistant microblogging. http://1am-networks.org.Google Scholar
- Albert-László Barabási and Réka Albert. 1999. Emergence of scaling in random networks. Science 286, 5439, 590--512.Google Scholar
- Albert-László Barabási, Hawoong Jeong, Zoltan Néda, Erzsebet Ravasz, Andras Schubert, and Tamas Vicsek. 2002. Evolution of the social network of scientific collaborations. Physica A Statist. Mech. Appl. 311, 3--4, 590--614.Google ScholarCross Ref
- Albert-László Barabási and Joao Gama Oliveira. 2005. Human dynamics: Darwin and Einstein correspondence patterns. Nature 437, 7063, 1251.Google Scholar
- Christian Bauckhage, Kristian Kersting, and Bashir Rastegarpanah. 2013. The Weibull as a model of shortest path distributions in random networks. In Proceeding of the Workshop on Mining and Learning with Graphs (MLG’13). 1--6.Google Scholar
- Fabrício Benevenuto, Gabriel Magno, Tiago Rodrigues, and VirgílioAlmeida. 2010. Detecting spammers on Twitter. In Proceedings of the Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference (CEAS’10). 1--9.Google Scholar
- Catherine A. Bliss, Isabel M. Kloumann, Kameron Decker Harrison, Christopher M. Danforth, and Peter Sheridan Dodds. 2012. Twitter reciprocal reply networks exhibit assortativity with respect to happiness. J. Comput. Sci. 3, 388--397.Google ScholarCross Ref
- Béla Bollobás, Christian Borgs, Jennifer Chayes, and Oliver Riordan.2003. Directed scale-free graphs. In Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’03). 132--139. Google ScholarDigital Library
- Sean Borman. 2009. The expectation maximization algorithm: A short tutorial. http://www.seanborman.com/publications/EM_algorithm.pdf.Google Scholar
- Lawrence Brown, Noah Gans, Avishai Mandelbaum, Anat Sakov, Haipeng Shen,Sergey Zeltyn, and Linda Zhao. 2005. Statistical analysis of a telephone call center. J. Amer. Statist. Assoc. 100, 469, 36--50.Google ScholarCross Ref
- Julián Candia, Marta C. González, Pu Wang, Timothy Schoenharl, Greg Madey, and Albert-Laszló Barabási. 2008. Uncovering individual and collective human dynamics from mobile phone records. J. Physica A Math. Theoret. 41, 22, 224015.Google ScholarCross Ref
- Meeyoung Cha, Alan Mislove, and Krishna P. Gummadi. 2009. A measurement-driven analysis of information propagation in the Flickr social network. In Proceedings of the 18th International World Wide Web Conference (WWW’09). 721--730. Google ScholarDigital Library
- Deepayan Chakrabarti, Yiping Zhan, and Christos Faloutsos. 2004. R-MAT: A recursive model for graph mining. In Proceedings of the International Conference on Data Mining (ICDM’04). 442--446.Google ScholarCross Ref
- Xiaoling Chen, Rajarathnam Chandramouli, and Koduvayur P. Subbalakshmi. 2011. Scam detection in Twitter. In Proceedings of the SIAM Text Mining Workshop (SIAM’11). 1--10.Google Scholar
- Aaron Clauset, Cosma Rohilla Shalizi, and Mark E. J. Newman. 2009. Power-law distributions in empirical data. SIAM Rev. 51, 4, 661--703. Google ScholarDigital Library
- Arthur P. Dempster, Nan M. Laird, and Donald B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statist. Soc. B39, 1, 1--38.Google Scholar
- Sergey N. Dorogovtsev and Jose F. F. Mendes. 2000. Scaling behavior of developing and decaying networks. Europhys. Lett. 52, 33--39.Google ScholarCross Ref
- Sergey N. Dorogovtsev and Jose F. F. Mendes. 2001. Language as an evolving word web. Proc. Royal Soc. London B268, 1485, 2603--2606.Google Scholar
- Sergey N. Dorogovtsev and Jose F. F. Mendes. 2002. Evolution of networks. Adv. Phys. 51, 4, 1079--1187.Google ScholarCross Ref
- Nick Duffield, Carsten Lund, and Mikkel Thorup. 2005. Estimating flow distributions from sampled flow statistics. IEEE Trans. Netw. 13, 5, 933--946. Google ScholarDigital Library
- Giorgio Fagiolo. 2007. Clustering in complex directed networks. APS Phys. Rev. E76, 2, 26--107.Google Scholar
- Jacob G. Foster, David V. Foster, Peter Grassberger, and Maya Paczuski. 2010. Edge direction and the structure of networks. Proc. Nat. Acad. Sci. United States Amer. 107, 24, 10815--10820.Google ScholarCross Ref
- Miguel Freitas. 2013. Twister: Peer-to-peer microblogging. http://twister.net.co/.Google Scholar
- Maksym Gabielkov and Arnaud Legout. 2012. The complete picture of the Twitter social graph. In Proceedings of the International Conference on Emerging Networking Experiments and Technologies Student Workshop (CoNEXTStudent’12). 19--20. Google ScholarDigital Library
- Wojciech Galuba, Karl Aberer, Dipanjan Chakraborty, Zoran Despotovic, and Wolfgang Kellerer. 2010. Outtweeting the twitterers---Predicting information cascades in microblogs. In Proceedings of the 3rd Workshop on Online Social Networks (WOSN’10). Google ScholarDigital Library
- Saptarshi Ghosh, Ajitesh Srivastava, and Niloy Ganguly. 2012. Effects of a soft cut-off on node-degree in the Twitter social network. Comput. Comm. 35, 7, 784--795. Google ScholarDigital Library
- Kwang-Il Goh and Albert-Lásló Barabási. 2008. Burstiness and memory in complex systems. Europhys. Lett. 81, 4.Google ScholarCross Ref
- Leo A. Goodman. 1961. Snowball sampling. Annals Math. Statist. 32, 1, 148--170.Google ScholarCross Ref
- Uli Harder and Maya Paczuski. 2006. Correlated dynamics in human printing behavior. Physica A Statist. Mech. Appl. 361, 1, 329--336.Google ScholarCross Ref
- Hai-Bo Hu and Xiao-Fan Wong. 2009. Disassortative mixing in online social networks. Europhys. Lett. 86, 1.Google ScholarCross Ref
- Bernardo A. Huberman, Daniel M. Romero, and Fang Wu. 2009. Crowdsourcing, attention and productivity. J. Inf. Sci. 35, 6, 758--765. Google ScholarDigital Library
- Akshay Java, Xiaodan Song, Tim Finin, and Belle Tseng. 2007. Why we Twitter: Understanding microblogging usage and communities. In Proceedings of the 9th WebKDD and 1st SNA-KDD Workshop on Web Mining and Social Network Analysis (WebKDD/SNA-KDD’07). 56--65. Google ScholarDigital Library
- Normal L. Johnson, Adrienne W. Kemp, and Samuel Kotz. 2005. Univariate Discrete Distributions, 3rd Ed. John Wiley and Sons.Google Scholar
- Marcus Kaiser. 2008. Mean clustering coefficients: The role of isolated nodes and leafs on clustering measures for small-world networks. New J. Phys. 10, 8.Google ScholarCross Ref
- Maurice George Kendall. 1938. A new measure of rank correlation. Biometrika 30, 1--2, 81--93.Google ScholarCross Ref
- Ravi Kumar, Jasmine Novak, and Andrew Tomkins. 2006. Structure and evolution of online social networks. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’06). 611--617. Google ScholarDigital Library
- Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a social network or a news media? In Proceedings of the 19th International Conference on World Wide Web (WWW’10). 591--600. http://an.kaist.ac.kr/traces/WWW2010.html. Google ScholarDigital Library
- Sang Hoon Lee, Pan-Jun Kim, and Hawoong Jeong. 2006. Statistical properties of sampled networks. APS Phys. Rev. E73, 1.Google Scholar
- Jure Leskovec and Eric Horvitz. 2008. Planetary-scale views on a large instant-messaging network. In Proceedings of the International Conference on World Wide Web (WWW’08). 915--924. Google ScholarDigital Library
- Nelly Litvak and Remco Van Der Hofstad. 2013. Uncovering disassortativity in large scale-free networks. APS Phys. Rev. E87, 2.Google Scholar
- Gilad Lotan, Erhardt Graeff, Mike Ananny, Devin Gaffney, Ian Pearce, and Danah Boyd. 2011. The revolutions were tweeted: Information flows during the 2011 Tunisian and Egyptian revolutions. Int. J. Comm. 5, 1375--1405.Google Scholar
- Alfred J. Lotka. 1926. The frequency distribution of scientific productivity. J. Washington Acad. Sci. 16, 12, 317--324.Google Scholar
- Michael Mccord and Mooi C. Chuah. 2011. Spam detection on Twitter using traditional classifiers. In Proceedings of the 8th International Conference on Autonomic and Trusted Computing (ATC’11). 175--186. Google ScholarDigital Library
- Geoffrey J. Mclachlan and Thriyambakam Krishnan. 2008. The EM Algorithm and Extensions, 2nd Ed. John Wiley and Sons.Google Scholar
- Stanley Milgram. 1967. The small-world problem. Psychol. Today 1, 1, 61--67.Google Scholar
- Staša Milojevic. 2010. Power-law distributions in information science---Making the case for logarithmic binning. J. Amer. Soc. Inf. Sci. Technol. 61, 12, 2417--2425. Google ScholarDigital Library
- Toshio Nakagawa and Shunji Osaki. 1975. The discrete Weibull distribution. IEEE Trans. Reliab. R-24, 5, 300--301.Google ScholarCross Ref
- Mark E. J. Newman. 2002. Assortative mixing in networks. Phys. Rev. Lett. 89, 20.Google ScholarCross Ref
- Christopher R. Palmer, Georgos Siganos, Michalis Faloutsos, Christos Faloutsos, and Phillip B. Gibbons. 2001. The connectivity and fault-tolerance of the internet topology. In Proceedings of the Workshop on Network-Related Data Management (NRDM’01). 1--6.Google Scholar
- William J. Reed and Murray Jorgensen. 2004. The double Pareto-lognormal distribution---A new parametric model for size distributions. Comm. Statist. Theory Methods 33, 8,1733--1753.Google ScholarCross Ref
- Pramod J. Sadalage and Martin Fowler. 2012. NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence, 1st Ed. Addison-Wesley Professional. Google ScholarDigital Library
- Daniel R. Sandler and Dan S. Wallach. 2009. Birds of a FETHR: Open, decentralized micropublishing. In Proceedings of the 8th International Conference on Peer-to-Peer Systems (IPTPS’09). 1--6. Google ScholarDigital Library
- Mukund Seshadri, Sridhar Machiraju, Ashwin Sridharan, Jean Bolot, ChristosFaloutsos, and Jure Leskovec. 2008. Mobile call graphs: Beyond power-law and lognormal distributions. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’08). 596--604. Google ScholarDigital Library
- Se.-W. Son, Claire Christensen, Golnoosh Bizhani, David V. Foster, Peter Grassberger, and Maya Paczuski. 2012. Sampling properties of directed networks. APS Phys. Rev. E86, 4.Google Scholar
- Jonghyuk Song, Sangho Lee, and Jong Kim. 2011. Spam filtering in Twitter using sender-receiver relationship. In Proceedings of the 14th International Conference on Recent Advances in Intrusion Detection (RAID’11). 301--317. Google ScholarDigital Library
- Pierre St. Juste, David Wolinsky, P. Oscar Boykin, and Renato J. Figueiredo. 2011. Litter: A lightweight peer-to-peer microblogging service. In Proceedings of the 3rd IEEE International Conference on Privacy, Security, Risk and Trust (PASSAT’11). 900--903.Google Scholar
- William E. Stein and Ronald Dattero. 1984. A new discrete Weibull distribution. IEEE Trans. Reliab. R33, 2, 196--197.Google ScholarCross Ref
- Michael P. H. Stumpf, Carsten Wiuf, and Robert M. May. 2005. Subnets of scale-free networks are not scale-free: Sampling properties of networks. Proc. Nat. Acad. Sci. United States Amer. 102, 12, 4221--4224.Google ScholarCross Ref
- Bongwon Suh, Lichan Hong, Petr Pirolli, and Ed H. Chi. 2010. Want to be retweeted? Large scale analytics on factors impacting retweet in Twitter network. In Proceedings of the 2nd IEEE International Conference on Social Computing (SOCIALCOM’10). 177--184. Google ScholarDigital Library
- Ole Tange. 2011. GNU parallel---The command-line power tool. Login: The USENIX Mag. 36, 1, 42--47. http://www.gnu.org/s/parallel.Google Scholar
- Abraham Ronel Martínez Teutle. 2010. Twitter: Network properties analysis. In Proceedings of the International Conference on Electronics, Communications, and Computer (CONIELECOMP’10). 180--186.Google Scholar
- Kurt Thomas, Chris Grier, and Vern Paxson. 2012. Adapting social spam infrastructure for political censorship. In Proceedings of the 5th Workshop on Large-Scale Exploits and Emergent Threats (LEET’12). Google ScholarDigital Library
- Kurt Thomas, Chris Grier, Dawn Song, and Vern Paxson. 2011. Suspended accounts in retrospect: An analysis of Twitter spam. In Proceedings of the ACM SIGCOMM Internet Measurement Conference (IMC’11). 243--256. Google ScholarDigital Library
- Jeffrey Travers and Stanley Milgram. 1969. An experimental study of the small world problem. Sociometry 32, 4, 425--443.Google ScholarCross Ref
- Alex Hai Wang. 2010. Don’t follow me: Spam detection in Twitter. In Proceedings of the International Conference on Security and Cryptography (SECRYPT’10). 1--10.Google Scholar
- Audrey Watters. 2011. How recent changes to Twitter’s terms of service might hurt academic research. http://webcitation.org/6MgAFaaMi. http://readwrite.com/2011/03/03/how_recent_changes_to_twitters_terms_of_service_mi.Google Scholar
- Duncan J. Watts and Steven H. Strogatz. 1998. Collective dynamics of ‘small-world’ networks. Nature 393, 6684, 440--442.Google Scholar
- Dennis M. Wilkinson. 2008. Strong regularities in online peer production. In Proceedings of the 9th Conference on Electronic Commerce (EC’08). 302--309. Google ScholarDigital Library
- Shaomei Wu, Jake M. Hofman, Winter A. Mason, and Duncan J. Watts. 2011. Who says what to whom on Twitter. In Proceedings of the International Conference on World Wide Web (WWW’11). 705--714. Google ScholarDigital Library
- Tianyin Xu, Yang Chen, Jin Zhao, and Xiaoming Fu. 2010. Cuckoo: Towards decentralized, socio-aware online microblogging services and data measurements. In Proceedings of the 2nd ACM International Workshop on Hot Topics in Planet-Scale Measurement (HotPlanet’10). Google ScholarDigital Library
- Chao Yang, Robert Harkreader, Jialong Zhang, Seungwon Shin, and Guofei Gu. 2012. Analyzing spammer’s social networks for fun and profit: A case study of cyber criminal ecosystem on Twitter. In Proceedings of the 21st International Conference on World Wide Web (WWW’12). 71--80. Google ScholarDigital Library
- Chao Yang, Robert Chandler Harkreader, and Guofei Gu. 2011. Die free or live hard? Empirical evaluation and new design for fighting evolving Twitter spammers. In Proceedings of the 14th International Conference on Recent Advances in Intrusion Detection (RAID’11). 318--337. Google ScholarDigital Library
- Jaewon Yang and Jure Leskovec. 2011. Patterns of temporal variation in online media. In Proceedings of the 4th ACM International Conference on Web Search and Data Mining (WSDM’11). 177--186. Google ScholarDigital Library
- Haifeng Yu, Phillip B. Gibbons, Michael Kaminsky, and Feng Xiao. 2008a. SybilLimit: A near-optimal social network defense against Sybil attacks. In Proceedings of the IEEE Symposium on Security and Privacy (SP’08). 3--17. Google ScholarDigital Library
- Haifeng Yu, Michael Kaminsky, Phillip B. Gibbons, and Abraham Flaxman. 2008b. SybilGuard: Defending against Sybil attacks via social networks. IEEE Trans. Netw. 16, 3, 576--589. Google ScholarDigital Library
Index Terms
- Aggregate Characterization of User Behavior in Twitter and Analysis of the Retweet Graph
Recommendations
Retweet Behavior Prediction in Twitter
ISCID '14: Proceedings of the 2014 Seventh International Symposium on Computational Intelligence and Design - Volume 02Retweet, as a main way to spread information in twitter, has been researched in a number of works. Recently research focuses on analyzing the factors of retweet behavior. However, the prediction on retweet behavior is a new challenge which is not well ...
Analyzing User Retweet Behavior on Twitter
ASONAM '12: Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)This paper provides a deep analysis of user retweet behavior on Twitter. While previous works about analyzing retweet have mainly focused on predicting the retweetability of each tweet, they lacked interpretations at an individual level. In this paper, ...
Comments