skip to main content
10.1145/2566486.2568041acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

A time-based collective factorization for topic discovery and monitoring in news

Published: 07 April 2014 Publication History

Abstract

Discovering and tracking topic shifts in news constitutes a new challenge for applications nowadays. Topics evolve,emerge and fade, making it more difficult for the journalist -or the press consumer- to decrypt the news. For instance, the current Syrian chemical crisis has been the starting point of the UN Russian initiative and also the revival of the US France alliance. A topical mapping representing how the topics evolve in time would be helpful to contextualize information. As far as we know, few topic tracking systems can provide such temporal topic connections. In this paper, we introduce a novel framework inspired from Collective Factorization for online topic discovery able to connect topics between different time-slots. The framework learns jointly the topics evolution and their time dependencies. It offers the user the ability to control, through one unique hyper-parameter, the tradeoff between the past accumulated knowledge and the current observed data. We show, on semi-synthetic datasets and on Yahoo News articles, that our method is competitive with state-of-the-art techniques while providing a simple way to monitor topics evolution (including emerging and disappearing topics).

References

[1]
David M Blei, Thomas L Griffiths, and Michael I Jordan. The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies. Journal of the ACM (JACM), 57(2):7, 2010.
[2]
D.M. Blei and J.D. Lafferty. Dynamic topic models. In Proceedings of the 23rd international conference on Machine learning, pages 113--120. ACM, 2006.
[3]
D.M. Blei, A.Y. Ng, and M.I. Jordan. Latent dirichlet allocation. The Journal of Machine Learning Research, 3:993--1022, 2003.
[4]
Bin Cao, Dou Shen, Jian-Tao Sun, Xuanhui Wang, Qiang Yang, and Zheng Chen. Detect and track latent factors with online nonnegative matrix factorization. In IJCAI 2007, Proceedings of the 20th International Joint Conference on Artificial Intelligence, Hyderabad, India, January 6-12, 2007.
[5]
Andrzej Cichocki, Rafal Zdunek, Anh Huy Phan, and Shun-ichi Amari. Nonnegative matrix and tensor factorizations: applications to exploratory multi-way data analysis and blind source separation. Wiley, 2009.
[6]
Lee Daniel and Seung Sebastian. Learning the parts of objects by non-negative matrix factorization. Nature, 1999.
[7]
Jonathan G. Fiscus, George R. Doddington, John S. Garofolo, and Alvin F. Martin. Nist's 1998 topic detection and tracking evaluation (tdt2). In Sixth European Conference on Speech Communication and Technology, EUROSPEECH 1999, Budapest, Hungary, September 5-9, 1999.
[8]
T. Fukuhara, T. Murayama, and T. Nishida. Analyzing concerns of people using weblog articles and real world temporal data. In Proceedings of WWW 2005 2nd Annual Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics, 2005.
[9]
Qi He, Kuiyu Chang, Ee-Peng Lim, and A. Banerjee. Keep it simple with time: A reexamination of probabilistic topic detection models. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 32(10):1795--1808, oct. 2010.
[10]
Ngoc-Diep Ho. Nonnegative matrix factorization algorithms and applications. PhD thesis, Universite Catholique de Louvain, 2008.
[11]
Shiva Prasad Kasiviswanathan, Prem Melville, Arindam Banerjee, and Vikas Sindhwani. Emerging topic detection using dictionary learning. In Proceedings of the 20th ACM Conference on Information and Knowledge Management, CIKM 2011, Glasgow, United Kingdom, October 24-28, pages 745--754, 2011.
[12]
Noriaki Kawamae. Trend analysis model: trend consists of temporal words, topics, and timestamps. In Proceedings of the fourth ACM international conference on Web search and data mining, WSDM '11, pages 317--326. ACM, 2011.
[13]
Daniel D. Lee and H. Sebastian Seung. Algorithms for non-negative matrix factorization. In Advances in Neural Information Processing Systems 13, (NIPS), Denver, CO, USA, pages 556--562, 2000.
[14]
D.D. Lee, H.S. Seung, et al. Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755):788--791, 1999.
[15]
J. Lehmann, B. Goncalves, J.J. Ramasco, and C. Cattuto. Dynamical classes of collective attention in twitter. In Proceedings of the 21st international conference on World Wide Web, WWW '12, pages 251--260. ACM, 2012.
[16]
Y. Liu, A. Niculescu-Mizil, and W. Gryc. Topic-link lda: joint models of topic and author community. In proceedings of the 26th annual international conference on machine learning, pages 665--672. ACM, 2009.
[17]
J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online learning for matrix factorization and sparse coding. The Journal of Machine Learning Research, 11:19--60, 2010.
[18]
Christopher Manning, Prabhakar Raghavan, and Hinrich Schutze. Introduction to information retrieval. Cambridge University Press, 2008.
[19]
M. Mathioudakis and N. Koudas. Twittermonitor: trend detection over the twitter stream. In Proceedings of the 2010 international conference on Management of data, pages 1155--1158. ACM, 2010.
[20]
A. McCallum, X. Wang, and A. Corrada-Emmanuel. Topic and role discovery in social networks with experiments on enron and academic email. Journal of Artificial Intelligence Research, 30(1):249--272, 2007.
[21]
David Newman, Jey Han Lau, Karl Grieser, and Timothy Baldwin. Automatic evaluation of topic coherence. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 100--108. Association for Computational Linguistics, 2010.
[22]
Bree Nordenson. Overload! Columbia Journalism Review, 47(4):30--32, 2008.
[23]
Sasa Petrovic, Miles Osborne, and Victor Lavrenko. Streaming first story detection with application to twitter. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, HLT '10, pages 181--189, Stroudsburg, PA, USA, 2010. Association for Computational Linguistics.
[24]
Ian Porteous, David Newman, Alexander Ihler, Arthur Asuncion, Padhraic Smyth, and Max Welling. Fast collapsed gibbs sampling for latent dirichlet allocation. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '08, pages 569--577. ACM, 2008.
[25]
Ankan Saha and Vikas Sindhwani. Learning evolving and emerging topics in social media: a dynamic nmf approach with temporal regularization. In Proceedings of the Fifth International Conference on Web Search and Web Data Mining, WSDM 2012, Seattle, WA, USA, February 8-12, pages 693--702, 2012.
[26]
Yuichiro Sekiguchi, Harumi Kawashima, Hidenori Okuda, and Masahiro Oku. Topic detection from blog documents using users O interests. In Mobile Data Management, 2006. MDM 2006. 7th International Conference on, pages 108--108. IEEE, 2006.
[27]
D.A. Shamma, L. Kennedy, and E.F. Churchill. Peaks and persistence: modeling the shape of microblog conversations. In Proceedings of the ACM 2011 conference on Computer supported cooperative work, pages 355--358. ACM, 2011.
[28]
Ajit P Singh and Geoffrey J Gordon. Relational learning via collective matrix factorization. In ACM Conference on Knowledge Discovery and Data Mining, 2008.
[29]
Xuerui Wang and Andrew McCallum. Topics over time: a non-markov continuous-time model of topical trends. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '06, pages 424--433, 2006.
[30]
Yu Wang, Eugene Agichtein, and Michele Benzi. Tm-lda: efficient online modeling of latent topic transitions in social media. In The 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '12, Beijing, China, August 12-16, pages 123--131, 2012.

Cited By

View all
  • (2023)Time-Aware Recommender Systems: A Comprehensive Survey and Quantitative Assessment of LiteratureIEEE Access10.1109/ACCESS.2023.327411711(45586-45604)Online publication date: 2023
  • (2022)A greek parliament proceedings dataset for computational linguistics and political analysisProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602363(28874-28888)Online publication date: 28-Nov-2022
  • (2022)TDTMFInformation Processing and Management: an International Journal10.1016/j.ipm.2022.10303759:5Online publication date: 1-Sep-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '14: Proceedings of the 23rd international conference on World wide web
April 2014
926 pages
ISBN:9781450327442
DOI:10.1145/2566486

Sponsors

  • IW3C2: International World Wide Web Conference Committee

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 April 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. collective factorization
  2. online learning
  3. streaming
  4. topic discovery
  5. topic monitoring
  6. topic tracking

Qualifiers

  • Research-article

Funding Sources

Conference

WWW '14
Sponsor:
  • IW3C2

Acceptance Rates

WWW '14 Paper Acceptance Rate 84 of 645 submissions, 13%;
Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)19
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Time-Aware Recommender Systems: A Comprehensive Survey and Quantitative Assessment of LiteratureIEEE Access10.1109/ACCESS.2023.327411711(45586-45604)Online publication date: 2023
  • (2022)A greek parliament proceedings dataset for computational linguistics and political analysisProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602363(28874-28888)Online publication date: 28-Nov-2022
  • (2022)TDTMFInformation Processing and Management: an International Journal10.1016/j.ipm.2022.10303759:5Online publication date: 1-Sep-2022
  • (2022)A novel temporal recommender system based on multiple transitions in user preference drift and topic review evolutionExpert Systems with Applications: An International Journal10.1016/j.eswa.2021.115626185:COnline publication date: 22-Apr-2022
  • (2022)Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysisArtificial Intelligence Review10.1007/s10462-022-10254-w56:6(5133-5260)Online publication date: 26-Oct-2022
  • (2021)Political Memes and Fake News Discourses on InstagramMedia and Communication10.17645/mac.v9i1.35339:1(276-290)Online publication date: 3-Mar-2021
  • (2020)Probabilistic Dynamic Non-negative Group Factor Model for Multi-source Text MiningProceedings of the 29th ACM International Conference on Information & Knowledge Management10.1145/3340531.3411956(1035-1043)Online publication date: 19-Oct-2020
  • (2020)Affinity Regularized Non-Negative Matrix Factorization for Lifelong Topic ModelingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2019.290468732:7(1249-1262)Online publication date: 1-Jul-2020
  • (2020)Dynamic Collaborative Filtering Based on User Preference Drift and Topic EvolutionIEEE Access10.1109/ACCESS.2020.29932898(86433-86447)Online publication date: 2020
  • (2020)A Novel Event Detection Model Based on Graph Convolutional NetworkWeb Information Systems Engineering10.1007/978-981-15-3281-8_15(172-184)Online publication date: 6-Feb-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media