skip to main content
research-article

A Graph Analytical Approach for Topic Detection

Published: 01 December 2013 Publication History

Abstract

Topic detection with large and noisy data collections such as social media must address both scalability and accuracy challenges. KeyGraph is an efficient method that improves on current solutions by considering keyword cooccurrence. We show that KeyGraph has similar accuracy when compared to state-of-the-art approaches on small, well-annotated collections, and it can successfully filter irrelevant documents and identify events in large and noisy social media collections. An extensive evaluation using Amazon’s Mechanical Turk demonstrated the increased accuracy and high precision of KeyGraph, as well as superior runtime performance compared to other solutions.

References

[1]
Aggarwal, C. and Subbian, K. 2012. Event detection in social streams. In Proceedings of the SIAM International Conference on Data Mining (SDM). 624--635.
[2]
Allan, J., Papka, R., and Lvrenko, V. 1998. On-line new event detection and tracking. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR).
[3]
Al Sumait, L., Barbará, D., and Domeniconi, C. 2008. On-line lDA: Adaptive topic models for mining text streams with applications to topic detection and tracking. In Proceedings of the International Conference on Data Mining (ICDM). 3--12.
[4]
Asuncion, A., Welling, M., Smyth, P., and Teh, Y.-W. 2009. On smoothing and inference for topic models. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI).
[5]
Asur, S., Huberman, B. A., Szabó, G., and Wang, C. 2011. Trends in social media: Persistence and decay. In Proceedings of the 5th International AAAI Conference on Weblogs and Social Media (ICWSM).
[6]
Becker, H., Naaman, M., and Gravano, L. 2010. Learning similarity metrics for event identification in social media. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining (WSDM). 291--300.
[7]
Blei, D. M., Ng, A. Y., and Jordan, M. I. 2003. Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993--1022.
[8]
Brants, T., Chen, F., and Farahat, A. 2003. A system for new event detection. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 330--337.
[9]
Bun, K. K., Ishizuka, M., and Ishizuka, B. M. 2002. Topic extraction from news archive using tf*pdf algorithm. In Proceedings of the 3rd International Conference on Web Informtion Systems Engineering (WISE). 73--82.
[10]
Cataldi, M., Di Caro, L., and Schifanella, C. 2010. Emerging topic detection on twitter based on temporal and social terms evaluation. In Proceedings of the 10th International Workshop on Multimedia Data Mining (MDMKDD). 4:1--4:10.
[11]
Chang, J., Boyd-Graber, J., Wang, C., Gerrish, S., and Blei, D. M. 2009. Reading tea leaves: How humans interpret topic models. J. Neural Inform. Process. Syst. 31.
[12]
Cutting, D. R., Karger, D. R., Pedersen, J. O., and Tukey, J. W. 1992. Scatter/gather: A cluster-based approach to browsing large document collections. In Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 318--329.
[13]
Dhillon, I. S. 2001. Co-clustering documents and words using bipartite spectral graph partitioning. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 269--274.
[14]
He, Q., Chang, K., and Lim, E.-P. 2007a. Analyzing feature trajectories for event detection. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). 207--214.
[15]
He, Q., Chang, K., and Lim, E.-P. 2007b. Using burstiness to improve clustering of topics in news streams. In Proceedings of the 7th IEEE International Conference on Data Mining (ICDM). 493--498.
[16]
Hu, Y., John, A., Seligmann, D. D., and Wang, F. 2012. What were the tweets about? Topical associations between public events and twitter feeds. In Proceedings of the 6th International AAAI Conference on Weblogs and Social Media (ICWSM).
[17]
Kernighan, B. W. and Lin, S. 1970. An efficient heuristic procedure for partitioning graphs. Bell Syst. Tech. J. 49, 1, 291--307.
[18]
LDA-Blei. C implementation of variational expectation maximization for latent Dirichlet allocation (LDA). http://www.cs.princeton.edu/~blei/lda-c/index.html.
[19]
LDA-Mallet. Machine learning for language toolkit. http://mallet.cs.umass.edu/topics.php.
[20]
Li, H. and Yamanishi, K. 2000. Topic analysis using a finite mixture model. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP). 35--44.
[21]
Li, J., Huang, L., Bai, T., Wang, Z., and Chen, H. 2012. CDBIA: A dynamic community detection method based on incremental analysis. In Proceedings of the International Conference on Systems and Informatics (ICSAI). 2224--2228.
[22]
Li, Z., Wang, B., Li, M., and Ma, W.-Y. 2005. A probabilistic model for retrospective news event detection. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR).
[23]
Mori, M., Miura, T., and Shioya, I. 2004. Extracting events from web pages. In Proceedings of the International Conference on Advances in Intelligent Systems - Theory and Applications (AISTA).
[24]
Mori, M., Miura, T., and Shioya, I. 2006. Topic detection and tracking for news web pages. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence (WI). 338--342.
[25]
Newman, D., Lau, J. H., Grieser, K., and Baldwin, T. 2010. Automatic evaluation of topic coherence. In Proceedings of the Annual Conference of the North American Chapter of the Association for Human Language Technologies (HLT). 100--108.
[26]
Newman, M. E. J. 2004. Detecting community structure in networks. Euro. Phys. J. B---Condensed Matter and Complex Systems 38, 2, 321--330.
[27]
Ohsawa, Y., Benson, N. E., and Yachida, M. 1998. Keygraph: Automatic indexing by co-occurrence graph based on building construction metaphor. In Proceedings of the Advances in Digital Libraries Conference (ADL).
[28]
Pereira, F., Tishby, N., and Lee, L. 1993. Distributional clustering of english words. In Proceedings of the 31st Annual Meeting of Association for Computational Linguistics (ACL). 183--190.
[29]
Prabowo, R., Thelwall, M., Hellsten, I., and Scharnhorst, A. 2008. Evolving debates in online communication: A graph analytical approach. Internet Res.: Electron Netw. App. Policy 18, 5, 520--540.
[30]
Radicchi, F., Castellano, C., Cecconi, F., Loreto, V., and Parisi, D. 2004. Defining and identifying communities in networks. Proc. Natl. Acad. Sci. 101, 9, 2658--2663.
[31]
Ruan, N., Jin, R., Lee, V., and Huang, K. 2009a. Dynamic module discovery in temporal complex networks. In Proceedings of the 2nd International Workshop on Analysis of Dynamic Networks, in Conjunction with SIAM International Conference on Data Mining.
[32]
Ruan, N., Jin, R., Lee, V., and Huang, K. 2009b. A sparsification approach for temporal graphical model decomposition. In Proceedings of the International Conference on Data Mining (ICDM).
[33]
Salton, G. and Buckley, C. 1988. Term-weighting approaches in automatic text retrieval. Inform. Proc. Manage. 24, 513--523.
[34]
Sayyadi, H., Hurst, M., and Maykov, A. 2009. Event detection and tracking in social streams. In Proceedings of the 3rd International Conference on Weblogs and Social Media (ICWSM).
[35]
Steyvers, M. and Griffiths, T. 2007. Probabilistic Topic Models. Lawrence Erlbaum Associates.
[36]
Tantrum, J., Murua, A., and Stuetzle, W. 2002. Hierarchical model-based clustering of large datasets through fractionation and refractionation. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 183--190.
[37]
Teh, Y. W., Jordan, M. I., Beal, M. J., and Blei, D. M. 2004. J. Amer. Statist. Assoc. 476, 1566--1581.
[38]
Toda, H. and Kataoka, R. 2005. A search result clustering method using informatively named entities. In Proceedings of the 7th Annual ACM International Workshop on Web Information and Data Management (WIDM). 81--86.
[39]
Wang, C., Blei, D. M., and Heckerman, D. 2008. Continuous time dynamic topic models. In Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence (UAI). 579--586.
[40]
Wang, X. and McCallum, A. 2006. Topics over time: A non-Markov continuous-time model of topical trends. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 424--433.
[41]
Wartena, C. and Brussee, R. 2008. Topic detection by clustering keywords. In Proceedings of the IEEE Computer Society DEXA Workshops. 54--58.
[42]
Yang, Y., Pierce, T., and Carbonell, J. G. 1998. A study on retrospective and on-line event detection. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR).

Cited By

View all
  • (2024)I-Topic: An Image-text Topic Modeling Method Based on Community Detection2024 5th International Conference on Computer Engineering and Application (ICCEA)10.1109/ICCEA62105.2024.10603702(797-800)Online publication date: 12-Apr-2024
  • (2024)Mongolian-Chinese Cross-lingual Topic Detection Based on Knowledge Distillation2024 International Conference on Asian Language Processing (IALP)10.1109/IALP63756.2024.10661180(383-388)Online publication date: 4-Aug-2024
  • (2024)A popular topic detection method based on microblog images and short text informationJournal of Web Semantics10.1016/j.websem.2024.10082081(100820)Online publication date: Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Internet Technology
ACM Transactions on Internet Technology  Volume 13, Issue 2
December 2013
70 pages
ISSN:1533-5399
EISSN:1557-6051
DOI:10.1145/2542214
  • Editor:
  • Munindar P. Singh
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 December 2013
Accepted: 01 August 2013
Revised: 01 June 2013
Received: 01 March 2012
Published in TOIT Volume 13, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. KeyGraph-based Topic Detection
  2. Topic detection
  3. community detection
  4. network analysis

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)38
  • Downloads (Last 6 weeks)4
Reflects downloads up to 12 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)I-Topic: An Image-text Topic Modeling Method Based on Community Detection2024 5th International Conference on Computer Engineering and Application (ICCEA)10.1109/ICCEA62105.2024.10603702(797-800)Online publication date: 12-Apr-2024
  • (2024)Mongolian-Chinese Cross-lingual Topic Detection Based on Knowledge Distillation2024 International Conference on Asian Language Processing (IALP)10.1109/IALP63756.2024.10661180(383-388)Online publication date: 4-Aug-2024
  • (2024)A popular topic detection method based on microblog images and short text informationJournal of Web Semantics10.1016/j.websem.2024.10082081(100820)Online publication date: Jul-2024
  • (2024)The impact of joint events on oil price volatility: Evidence from a dynamic graphical news analysis modelEconomic Modelling10.1016/j.econmod.2023.106587130(106587)Online publication date: Jan-2024
  • (2024)Mongolian-Chinese Cross-Lingual Topic Detection Based on Knowledge Distillation and Contrastive Learning MethodsPRICAI 2024: Trends in Artificial Intelligence10.1007/978-981-96-0119-6_19(187-199)Online publication date: 19-Nov-2024
  • (2024)Efficient Topic Detection Using an Adaptive Neural Network ArchitectureAdvances in Information Systems, Artificial Intelligence and Knowledge Management10.1007/978-3-031-51664-1_10(145-157)Online publication date: 20-Jan-2024
  • (2023)Monitoring of Public Opinion on Typhoon Disaster Using Improved Clustering Model Based on Single-Pass ApproachSage Open10.1177/2158244023120009813:3Online publication date: 29-Sep-2023
  • (2023)Real-Time Event Detection Using Self-Evolving Contextual Analysis (SECA) ApproachIEEE Access10.1109/ACCESS.2023.333121911(127011-127034)Online publication date: 2023
  • (2023)Twitter as a predictive system: A systematic literature reviewJournal of Business Research10.1016/j.jbusres.2022.113561157(113561)Online publication date: Mar-2023
  • (2023)BTD: An effective business-related hot topic detection scheme in professional social networksInformation Sciences10.1016/j.ins.2022.12.081630(420-442)Online publication date: Jun-2023
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media