ABSTRACT
Social influence analysis on microblog networks, such as Twitter, has been playing a crucial role in online advertising and brand management. While most previous influence analysis schemes rely only on the links between users to find key influencers, they omit the important text content created by the users. As a result, there is no way to differentiate the social influence in different aspects of life (topics). Although a few prior works do support topic-specific influence analysis, they either separate the analysis of content from the analysis of network structure, or assume that content is the only cause of links, which is clearly an inappropriate assumption for microblog networks.
To address the limitations of the previous approaches, we propose a novel Followship-LDA (FLDA) model, which integrates both content topic discovery and social influence analysis in the same generative process. This model properly captures the content-related and content-independent reasons why a user follows another in a microblog network. We demonstrate that FLDA produces results with significantly better precision than existing approaches. Furthermore, we propose a distributed Gibbs sampling algorithm for FLDA, and demonstrate that it provides excellent scalability on large clusters. Finally, we incorporate the FLDA model in a general search framework for topic-specific influencers. A user freely expresses his/her interest by typing a few keywords, the search framework will return a ranked list of key influencers that satisfy the user's interest.
- A. Ahmed, M. Aly, J. Gonzalez, S. Narayanamurthy, and A. J. Smola. Scalable inference in latent variable models. In WSDM'12, pages 123--132, 2012. Google ScholarDigital Library
- N. Barbieri, F. Bonchi, and G. Manco. Topic-aware social influence propagation models. In Proceedings of the 2012 IEEE 12th International Conference on Data Mining, ICDM '12, pages 81--90, 2012. Google ScholarDigital Library
- D. Battré, S. Ewen, F. Hueske, O. Kao, V. Markl, and D. Warneke. Nephele/pacts: a programming model and execution framework for web-scale analytical processing. In SoCC, 2010. Google ScholarDigital Library
- D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993--1022, March 2003. Google ScholarDigital Library
- V. R. Borkar, M. J. Carey, R. Grover, N. Onose, and R. Vernica. Hyracks: A flexible and extensible foundation for data-intensive computing. In ICDE, 2011. Google ScholarDigital Library
- S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In WWW'98, pages 107--117, 1998. Google ScholarDigital Library
- Y. Bu, B. Howe, M. Balazinska, and M. D. Ernst. Haloop: efficient iterative data processing on large clusters. PVLDB, 2010. Google ScholarDigital Library
- D. Chakrabarti and C. Faloutsos. Graph mining: Laws, generators, and algorithms. ACM COMPUTING SURVEYS, 38(1):2, 2006. Google ScholarDigital Library
- W. Chen, Y. Wang, and S. Yang. Efficient influence maximization in social networks. In KDD '09, pages 199--208, 2009. Google ScholarDigital Library
- E. Erosheva, S. Fienberg, and J. Lafferty. Mixed-membership models of scientific publications. Proceedings of the National Academy of Sciences, 101:5220--5227, 2004.Google ScholarCross Ref
- S. Ghosh, N. Sharma, F. Benevenuto, N. Ganguly, and K. Gummadi. Cognos: crowdsourcing search for topic experts in microblogs. In SIGIR '12, pages 575--590, 2012. Google ScholarDigital Library
- K. Gimpel, N. Schneider, B. O'Connor, D. Das, D. Mills, J. Eisenstein, M. Heilman, D. Yogatama, J. Flanigan, and N. A. Smith. Part-of-speech tagging for twitter: Annotation, features, and experiments. In ACL, pages 42--47, 2011. Google ScholarDigital Library
- M. Girolami and A. Kabán. On an equivalence between plsi and lda. In SIGIR '03, pages 433--434, 2003. Google ScholarDigital Library
- H. Haramoto, M. Matsumoto, T. Nishimura, F. Panneton, and P. L'Ecuyer. Efficient Jump Ahead for 2-Linear Random Number Generators. INFORMS Journal on Computing, 20(3):385--390, 2008. Google ScholarDigital Library
- T. H. Haveliwala. Topic-sensitive pagerank. In WWW '02, pages 517--526, 2002. Google ScholarDigital Library
- A. Java, P. Kolari, T. Finin, and T. Oates. Modeling the spread of influence on the blogosphere. In WWW 2006 Workshop on Weblogging Ecosystem: Aggregation, Analysis and Dynamics, 2006.Google Scholar
- D. Kempe, J. Kleinberg, and E. Tardos. Maximizing the spread of influence through a social network. In KDD '03, pages 137--146, 2003. Google ScholarDigital Library
- J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, and N. Glance. Cost-effective outbreak detection in networks. In KDD '07, pages 420--429, 2007 Google ScholarDigital Library
- L. Liu, J. Tang, J. Han, and S. Yang. Learning influence from heterogeneous social networks. Data Mining and Knowledge Discovery, 25:511--544, 2012.Google ScholarCross Ref
- G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In SIGMOD, 2010. Google ScholarDigital Library
- C. D. Manning, P. Raghavan, and H. Schtze. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA, 2008. Google ScholarDigital Library
- R. Nallapati and W. W. Cohen. Link-plsa-lda: A new unsupervised model for topics and influence of blogs. In Proceedings of the Second International Conference on Weblogs and Social Media, 2008.Google Scholar
- D. Newman, A. Asuncion, P. Smyth, and M. Welling. Distributed algorithms for topic models. J. Mach. Learn. Res., 10:1801--1828, Dec. 2009. Google ScholarDigital Library
- A. Pal and S. Counts. Identifying topical authorities in microblogs. In WSDM '11, pages 45--54, 2011. Google ScholarDigital Library
- A. Smola and S. Narayanamurthy. An architecture for parallel topic models. PVLDB, 3(1-2):703--710, Sept. 2010. Google ScholarDigital Library
- J. Tang, J. Sun, C. Wang, and Z. Yang. Social influence analysis in large-scale networks. In KDD '09, pages 807--816, 2009. Google ScholarDigital Library
- Twitter.com. Twitter turns six, 2012.Google Scholar
- J. Weng, E.-P. Lim, J. Jiang, and Q. He. Twitterrank: finding topic-sensitive influential twitterers. In WSDM '10, pages 261--270, 2010. Google ScholarDigital Library
- M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In NSDI'12, 2012. Google ScholarDigital Library
Index Terms
- Scalable topic-specific influence analysis on microblogs
Recommendations
A Topic Aware-based Approach to Maximize Social Influence
WebMedia '14: Proceedings of the 20th Brazilian Symposium on Multimedia and the WebThe use of social networks has shown great potential for information diffusion and formation of public opinion. One key problem that has attracted researchers interest is Topic-based Influence Maximization, that refers to finding a small set of users on ...
Extracting time series variation of topic popularity in microblogs
iiWAS2018: Proceedings of the 20th International Conference on Information Integration and Web-based Applications & ServicesExtracting topics and their popularities in microblogs is a promising approach to discover popular topics in the world. To challenge this task, some methods that estimate popularity of topics based on Latent Dirichlet Allocation (LDA) has been proposed. ...
Topic-Level Bursty Study for Bursty Topic Detection in Microblogs
Advances in Knowledge Discovery and Data MiningAbstractMicroblogging services, such as Twitter and Sina Weibo, have gained tremendous popularity in recent years. The huge amount of user-generated information is spread on microblogs. Such user-generated contents are a mixture of different bursty topics ...
Comments