skip to main content
10.1145/2396761.2396884acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

TUT: a statistical model for detecting trends, topics and user interests in social media

Published:29 October 2012Publication History

ABSTRACT

The rapid development of online social media sites is accompanied by the generation of tremendous web contents. Web users are shifting from data consumers to data producers. As a result, topic detection and tracking without taking users' interests into account is not enough. This paper presents a statistical model that can detect interpretable trends and topics from document streams, where each trend (short for trending story) corresponds to a series of continuing events or a storyline. A topic is represented by a cluster of words frequently co-occurred. A trend can contain multiple topics and a topic can be shared by different trends. In addition, by leveraging a Recurrent Chinese Restaurant Process (RCRP), the number of trends in our model can be determined automatically without human intervention, so that our model can better generalize to unseen data. Furthermore, our proposed model incorporates user interest to fully simulate the generation process of web contents, which offers the opportunity for personalized recommendation in online social media. Experiments on three different datasets indicated that our proposed model can capture meaningful topics and trends, monitor rise and fall of detected trends, outperform baseline approach in terms of perplexity on held-out dataset, and improve the result of user participation prediction by leveraging users' interests to different trends.

References

  1. Ahmed, A., Ho, Q., Eisenstein, J., Xing, E., Smola, A. J. and Teo, C. H. 2011. Unified analysis of streaming news. Proceedings of the 20th international conference on World Wide Web (WWW'11) ACM 267--276. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Ahmed, A. and Xing, E. 2008. Dynamic non-parametric mixture models and the recurrent chinese restaurant process. Proceedings of SDM 2008.Google ScholarGoogle Scholar
  3. AlSumait, L., Barbara, D. and Domeniconi, C. 2008. On-line LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking. Proceedings of the 2008 Eighth IEEE International Conference on Data Mining IEEE Computer Society 3--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Blei, D., Ng, A. and Jordan, M. 2003. Latent dirichlet allocation. The Journal of Machine Learning Research. 3, 993--1022. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Blei, D. M. and Lafferty, J. D. 2006. Dynamic topic models. Proceedings of the 23rd international conference on Machine learning Pittsburgh, Pennsylvania ACM 113--120. http://doi.acm.org/10.1145/1143844.1143859 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. He, Q., Chen, B., Pei, J., Qiu, B., Mitra, P. and Giles, L. 2009. Detecting topic evolution in scientific literature: how can citations help? Proceeding of the 18th ACM conference on Information and knowledge management ACM 957--966. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Hearst, M. A. and Pedersen, J. O. 1996. Reexamining the cluster hypothesis: scatter/gather on retrieval results. Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval Zurich, Switzerland ACM 76--84. 10.1145/243199.243216 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Hofmann, T. 1999. Probabilistic latent semantic indexing. Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval ACM New York, NY, USA 50--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Kawamae, N. 2011. Trend analysis model: trend consists of temporal words, topics, and timestamps. Proceedings of the fourth ACM international conference on Web search and data mining ACM 317--326. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. McCallum, A., Corrada-Emmanuel, A. and Wang, X. 2005. Topic and role discovery in social networks. Proceedings of the 19th international joint conference on Artificial intelligence Morgan Kaufmann Publishers Inc. 786--791. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Mei, Q. and Zhai, C. 2005. Discovering evolutionary theme patterns from text: an exploration of temporal text mining. Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining ACM New York, NY, USA 198--207. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Morinaga, S. and Yamanishi, K. 2004. Tracking dynamics of topic trends using a finite mixture model. Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining Seattle, WA, USA ACM 811--816. http://doi.acm.org/10.1145/1014052.1016919 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Rosen-Zvi, M., Griffiths, T., Steyvers, M. and Smyth, P. 2004. The author-topic model for authors and documents. Proceedings of the 20th conference on Uncertainty in artificial intelligence Banff, Canada AUAI Press 487--494. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Schult, R. and Spiliopoulou, M. 2006. Discovering emerging topics in unlabelled text collections. Lecture Notes in Computer Science. 4152, 353--366. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Shahaf, D., Guestrin, C. and Horvitz, E. 2012. Trains of thought: generating information maps. Proceedings of the 21st international conference on World Wide Web Lyon, France ACM 899--908. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Sun, Y., Tang, J., Han, J., Gupta, M. and Zhao, B. 2010. Community evolution detection in dynamic heterogeneous information networks. Proceedings of the Eighth Workshop on Mining and Learning with Graphs ACM 137--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Wang, X. and McCallum, A. 2006. Topics over time: a non-markov continuous-time model of topical trends. Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining ACM 424--433. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Zeng, H.-J., He, Q.-C., Chen, Z., Ma, W.-Y. and Ma, J. 2004. Learning to cluster web search results. Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval Sheffield, United Kingdom ACM 210--217. 10.1145/1008992.1009030 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. TUT: a statistical model for detecting trends, topics and user interests in social media

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management
      October 2012
      2840 pages
      ISBN:9781450311564
      DOI:10.1145/2396761

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 29 October 2012

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader