skip to main content
10.1145/2837185.2843853acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiiwasConference Proceedingsconference-collections
short-paper

INTWEEMS: a framework for incremental clustering of tweet streams

Published: 11 December 2015 Publication History

Abstract

Twitter is a popular micro-blogging service for sharing short messages called tweets. Tweets provide public opinion on various topics. Currently twitter presents search results in form of a flat list, sorted either by popularity or by recency. These search results limit the possibility of identifying diverse latent topics covered by the tweets. One way to better understand the tweets is to cluster them where each cluster depicts a latent topic. Suitable clustering algorithms are required to cluster streaming data and map new data into existing clusters. To address this, we propose in this paper a framework called INTWEEMS (INcremental clustering of TWEEt streaMS) which clusters tweets in real-time, adjusts new tweets into existing clusters (incrementally), and provides visualization of clusters that helps in identifying latent topics and sub-topics within the tweets. This paper describes the INTWEEMS framework and its implementation.

References

[1]
H. Becker, M. Naaman, and L. Gravano. Beyond trending topics: Real-world event identification on twitter. International Conference on Weblogs and Social Media, 11:438--441, 2011.
[2]
E. Ferrara, M. JafariAsbagh, O. Varol, V. Qazvinian, F. Menczer, and A. Flammini. Clustering memes in social media. In Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pages 548--555. Association for Computing Machinery, 2013.
[3]
W. J. Grant, B. Moon, and J. Busby Grant. Digital dialogue? Australian politicians' use of the social network tool Twitter. Australian Journal of Political Science, 45(4):579--604, 2010.
[4]
B. Güç et al. Information filtering on micro-blogging services. PhD thesis, Swiss Federal Institute of Technology Zurich, Institute of Information Systems, 2010.
[5]
J. Han, M. Kamber, and J. Pei. Data mining: concepts and techniques 3rd Edition. Morgan kaufmann, 2011.
[6]
H. Hromic, N. Prangnawarat, I. Hulpus, M. Karnstedt, and C. Hayes. Graph-based methods for clustering topics of interest in twitter. In P. Cimiano, F. Frasincar, G.-J. Houben, and D. Schwabe, editors, Engineering the Web in the Big Data Era, volume 9114 of Lecture Notes in Computer Science, pages 701--704. Springer International Publishing, 2015.
[7]
A. L. Hughes and L. Palen. Twitter adoption and use in mass convergence and emergency events. International Journal of Emergency Management, 6(3):248--260, 2009.
[8]
M. JafariAsbagh, E. Ferrara, O. Varol, F. Menczer, and A. Flammini. Clustering memes in social media streams. Social Network Analysis and Mining, 4(1), 2014.
[9]
A. Karandikar. Clustering short status messages: A topic model based approach. PhD thesis, University of Maryland, 2010.
[10]
S. Kumar, F. Morstatter, and H. Liu. Visualizing twitter data. In Twitter Data Analytics, pages 49--69. Springer, 2014.
[11]
K. Lei, W. Zhang, K. Zhang, and K. Xu. Extracting unknown words from sina weibo via data clustering. In Communications (ICC), 2015 IEEE International Conference on, pages 1182--1187, June 2015.
[12]
A. Olariu. Clustering to improve microblog stream summarization. In Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), 2012 14th International Symposium on, pages 220--226. IEEE, 2012.
[13]
R. Perdisci. JBirch Source Code. https://github.com/perdisci/jbirch, 2013. [Online; accessed 20-Oct-2015].
[14]
S. Poomagal, P. Visalakshi, and T. Hamsapriya. A novel method for clustering tweets in twitter. International Journal of Web Based Communities, 11(2):170--187, 2015.
[15]
N. Prangnawarat, I. Hulpus, and C. Hayes. Event analysis in social media using clustering of heterogeneous information networks. In Proceedings of The Twenty-Eighth International Flairs Conference, May 2015.
[16]
L. Shou, Z. Wang, K. Chen, and G. Chen. Sumblr: Continuous summarization of evolving tweet streams. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '13, pages 533--542, New York, NY, USA, 2013. ACM.
[17]
T. Zhang, R. Ramakrishnan, and M. Livny. Birch: An efficient data clustering method for very large databases. SIGMOD Rec., 25(2):103--114, June 1996.
[18]
X. Zhou, X. Wan, and J. Xiao. Collective opinion target extraction in Chinese microblogs. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1840--1850. Association for Computational Linguistics, 2013.
[19]
A. Zubiaga, H. Ji, and K. Knight. Curating and contextualizing twitter stories to assist with social newsgathering. In Proceedings of the 2013 international conference on Intelligent user interfaces, pages 213--224. Association for Computing Machinery, 2013.

Cited By

View all
  • (2024)An incremental clustering algorithm based on semantic conceptsKnowledge and Information Systems10.1007/s10115-024-02063-066:6(3303-3335)Online publication date: 1-Jun-2024
  • (2021)CommuniMentsResearch Anthology on Strategies for Using Social Media as a Service and Tool in Business10.4018/978-1-7998-9020-1.ch019(382-404)Online publication date: 2021
  • (2019)The role of collaborative tagging and ontologies in emerging semantic of web resourcesComputing10.1007/s00607-019-00704-9101:10(1489-1511)Online publication date: 1-Oct-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
iiWAS '15: Proceedings of the 17th International Conference on Information Integration and Web-based Applications & Services
December 2015
704 pages
ISBN:9781450334914
DOI:10.1145/2837185
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 December 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Twitter
  2. incremental clustering
  3. realtime clustering
  4. social media

Qualifiers

  • Short-paper

Conference

iiWAS '15

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)An incremental clustering algorithm based on semantic conceptsKnowledge and Information Systems10.1007/s10115-024-02063-066:6(3303-3335)Online publication date: 1-Jun-2024
  • (2021)CommuniMentsResearch Anthology on Strategies for Using Social Media as a Service and Tool in Business10.4018/978-1-7998-9020-1.ch019(382-404)Online publication date: 2021
  • (2019)The role of collaborative tagging and ontologies in emerging semantic of web resourcesComputing10.1007/s00607-019-00704-9101:10(1489-1511)Online publication date: 1-Oct-2019
  • (2017)CommuniMentsInternational Journal on Semantic Web and Information Systems10.4018/IJSWIS.201704010613:2(87-108)Online publication date: Apr-2017
  • (2017)MFS-LDA: a multi-feature space tag recommendation model for cold start problemProgram10.1108/PROG-01-2017-000251:3(218-234)Online publication date: 5-Sep-2017

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media