research-article

TUT: a statistical model for detecting trends, topics and user interests in social media

Authors:
Xuning Tang

Drexel University, Philadelphia, PA, USA

Drexel University, Philadelphia, PA, USA
View Profile

,
Christopher C. Yang

Drexel University, Philadelphia, PA, USA

Drexel University, Philadelphia, PA, USA
View Profile

CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge managementOctober 2012Pages 972–981https://doi.org/10.1145/2396761.2396884

Published:29 October 2012Publication History

CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

Pages 972–981

ABSTRACT

The rapid development of online social media sites is accompanied by the generation of tremendous web contents. Web users are shifting from data consumers to data producers. As a result, topic detection and tracking without taking users' interests into account is not enough. This paper presents a statistical model that can detect interpretable trends and topics from document streams, where each trend (short for trending story) corresponds to a series of continuing events or a storyline. A topic is represented by a cluster of words frequently co-occurred. A trend can contain multiple topics and a topic can be shared by different trends. In addition, by leveraging a Recurrent Chinese Restaurant Process (RCRP), the number of trends in our model can be determined automatically without human intervention, so that our model can better generalize to unseen data. Furthermore, our proposed model incorporates user interest to fully simulate the generation process of web contents, which offers the opportunity for personalized recommendation in online social media. Experiments on three different datasets indicated that our proposed model can capture meaningful topics and trends, monitor rise and fall of detected trends, outperform baseline approach in terms of perplexity on held-out dataset, and improve the result of user participation prediction by leveraging users' interests to different trends.

References

Ahmed, A., Ho, Q., Eisenstein, J., Xing, E., Smola, A. J. and Teo, C. H. 2011. Unified analysis of streaming news. Proceedings of the 20th international conference on World Wide Web (WWW'11) ACM 267--276. Google ScholarDigital Library
Ahmed, A. and Xing, E. 2008. Dynamic non-parametric mixture models and the recurrent chinese restaurant process. Proceedings of SDM 2008.Google Scholar
AlSumait, L., Barbara, D. and Domeniconi, C. 2008. On-line LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking. Proceedings of the 2008 Eighth IEEE International Conference on Data Mining IEEE Computer Society 3--12. Google ScholarDigital Library
Blei, D., Ng, A. and Jordan, M. 2003. Latent dirichlet allocation. The Journal of Machine Learning Research. 3, 993--1022. Google ScholarDigital Library
Blei, D. M. and Lafferty, J. D. 2006. Dynamic topic models. Proceedings of the 23rd international conference on Machine learning Pittsburgh, Pennsylvania ACM 113--120. http://doi.acm.org/10.1145/1143844.1143859 Google ScholarDigital Library
He, Q., Chen, B., Pei, J., Qiu, B., Mitra, P. and Giles, L. 2009. Detecting topic evolution in scientific literature: how can citations help? Proceeding of the 18th ACM conference on Information and knowledge management ACM 957--966. Google ScholarDigital Library
Hearst, M. A. and Pedersen, J. O. 1996. Reexamining the cluster hypothesis: scatter/gather on retrieval results. Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval Zurich, Switzerland ACM 76--84. 10.1145/243199.243216 Google ScholarDigital Library
Hofmann, T. 1999. Probabilistic latent semantic indexing. Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval ACM New York, NY, USA 50--57. Google ScholarDigital Library
Kawamae, N. 2011. Trend analysis model: trend consists of temporal words, topics, and timestamps. Proceedings of the fourth ACM international conference on Web search and data mining ACM 317--326. Google ScholarDigital Library
McCallum, A., Corrada-Emmanuel, A. and Wang, X. 2005. Topic and role discovery in social networks. Proceedings of the 19th international joint conference on Artificial intelligence Morgan Kaufmann Publishers Inc. 786--791. Google ScholarDigital Library
Mei, Q. and Zhai, C. 2005. Discovering evolutionary theme patterns from text: an exploration of temporal text mining. Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining ACM New York, NY, USA 198--207. Google ScholarDigital Library
Morinaga, S. and Yamanishi, K. 2004. Tracking dynamics of topic trends using a finite mixture model. Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining Seattle, WA, USA ACM 811--816. http://doi.acm.org/10.1145/1014052.1016919 Google ScholarDigital Library
Rosen-Zvi, M., Griffiths, T., Steyvers, M. and Smyth, P. 2004. The author-topic model for authors and documents. Proceedings of the 20th conference on Uncertainty in artificial intelligence Banff, Canada AUAI Press 487--494. Google ScholarDigital Library
Schult, R. and Spiliopoulou, M. 2006. Discovering emerging topics in unlabelled text collections. Lecture Notes in Computer Science. 4152, 353--366. Google ScholarDigital Library
Shahaf, D., Guestrin, C. and Horvitz, E. 2012. Trains of thought: generating information maps. Proceedings of the 21st international conference on World Wide Web Lyon, France ACM 899--908. Google ScholarDigital Library
Sun, Y., Tang, J., Han, J., Gupta, M. and Zhao, B. 2010. Community evolution detection in dynamic heterogeneous information networks. Proceedings of the Eighth Workshop on Mining and Learning with Graphs ACM 137--146. Google ScholarDigital Library
Wang, X. and McCallum, A. 2006. Topics over time: a non-markov continuous-time model of topical trends. Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining ACM 424--433. Google ScholarDigital Library
Zeng, H.-J., He, Q.-C., Chen, Z., Ma, W.-Y. and Ma, J. 2004. Learning to cluster web search results. Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval Sheffield, United Kingdom ACM 210--217. 10.1145/1008992.1009030 Google ScholarDigital Library

Index Terms

TUT: a statistical model for detecting trends, topics and user interests in social media
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

A density-based method for adaptive LDA model selection

Topic models have been successfully used in information classification and retrieval. These models can capture word correlations in a collection of textual documents with a low-dimensional set of multinomial distribution, called ''topics''. However, it ...
Read More
RankSum—An unsupervised extractive text summarization based on rank fusion
Abstract
In this paper, we propose Ranksum, an approach for extractive text summarization of single documents based on the rank fusion of four multi-dimensional sentence features extracted for each sentence: topic information, semantic content, ...
Graphical abstract

Display Omitted
Highlights
- A unified summarization framework with multi-dimensional sentence features.
- ...
Read More
Multi-document summarisation using feature distribution analysis

Recently, opinion documents have been growing rapidly in an environment where anyone can express an opinion on the internet or SNS. This situation requires an automatic summarisation technique in order to understand the contents of large-scale opinion ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management
October 2012
2840 pages
ISBN:9781450311564
DOI:10.1145/2396761
General Chair:
Xuewen Chen
Wayne State University, USA
,
Program Chairs:
Guy Lebanon
Georgia Institute of Technology
,
Haixun Wang
Microsoft Research Asia
,
Mohammed J. Zaki
Rensselaer Polytechnic Institute
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 October 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
evolution
modeling
topic
trend
user interest
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 15
  Total Citations
  View Citations
- 718
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

TUT: a statistical model for detecting trends, topics and user interests in social media

CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

A density-based method for adaptive LDA model selection

RankSum—An unsupervised extractive text summarization based on rank fusion

Multi-document summarisation using feature distribution analysis

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

TUT: a statistical model for detecting trends, topics and user interests in social media

CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

A density-based method for adaptive LDA model selection

RankSum—An unsupervised extractive text summarization based on rank fusion

Multi-document summarisation using feature distribution analysis

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media