skip to main content
10.1145/1277741.1277909acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Clustering short texts using wikipedia

Published: 23 July 2007 Publication History

Abstract

Subscribers to the popular news or blog feeds (RSS/Atom) often face the problem of information overload as these feed sources usually deliver large number of items periodically. One solution to this problem could be clustering similar items in the feed reader to make the information more manageable for a user. Clustering items at the feed reader end is a challenging task as usually only a small part of the actual article is received through the feed. In this paper, we propose a method of improving the accuracy of clustering short texts by enriching their representation with additional features from Wikipedia. Empirical results indicate that this enriched representation of text items can substantially improve the clustering accuracy when compared to the conventional bag of words representation.

References

[1]
E. Gabrilovich. Feature Generation for Textual Information Retrieval Using World Knowledge. PhD Thesis, Department of Computer Science, Technion -- Israel Institute of Technology, Haifa, Israel, 2006.
[2]
A. Hotho, S. Staab, and G. Stumme. Ontologies Improve Text Document Clustering, In the Proc of the Third IEEE International Conference on Data Mining (ICDM'03), Melbourne, Florida, USA, 2003.
[3]
G. Salton, editor. Automatic text processing. Addison-Wesley Longman Publishing Co., Inc. Boston, MA, USA, 1988.

Cited By

View all
  • (2024)Improving German News Clustering with Contrastive LearningProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679944(3979-3983)Online publication date: 21-Oct-2024
  • (2023)Leveraging Active Learning for Failure Mode AcquisitionSensors10.3390/s2305281823:5(2818)Online publication date: 4-Mar-2023
  • (2023)Cross-lingual Text Clustering in a Large SystemProceedings of the 2023 7th International Conference on Natural Language Processing and Information Retrieval10.1145/3639233.3639356(1-11)Online publication date: 15-Dec-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
July 2007
946 pages
ISBN:9781595935977
DOI:10.1145/1277741
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 July 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Wikipedia
  2. clustering
  3. feed reader

Qualifiers

  • Article

Conference

SIGIR07
Sponsor:
SIGIR07: The 30th Annual International SIGIR Conference
July 23 - 27, 2007
Amsterdam, The Netherlands

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)28
  • Downloads (Last 6 weeks)3
Reflects downloads up to 14 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Improving German News Clustering with Contrastive LearningProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679944(3979-3983)Online publication date: 21-Oct-2024
  • (2023)Leveraging Active Learning for Failure Mode AcquisitionSensors10.3390/s2305281823:5(2818)Online publication date: 4-Mar-2023
  • (2023)Cross-lingual Text Clustering in a Large SystemProceedings of the 2023 7th International Conference on Natural Language Processing and Information Retrieval10.1145/3639233.3639356(1-11)Online publication date: 15-Dec-2023
  • (2023)CEIL: A General Classification-Enhanced Iterative Learning Framework for Text ClusteringProceedings of the ACM Web Conference 202310.1145/3543507.3583457(1784-1792)Online publication date: 30-Apr-2023
  • (2023)Dynamic Transformation of Prior Knowledge Into Bayesian Models for Data StreamsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.313946935:4(3742-3750)Online publication date: 1-Apr-2023
  • (2023)Classification Research Based on Quantitative Expansion of Short Text Feature Correlation2023 2nd International Conference on Robotics, Artificial Intelligence and Intelligent Control (RAIIC)10.1109/RAIIC59453.2023.10280844(400-405)Online publication date: 11-Aug-2023
  • (2023)Developing Unsupervised Learning Techniques for Business News Articles2023 6th International Conference on Advances in Science and Technology (ICAST)10.1109/ICAST59062.2023.10454905(547-551)Online publication date: 8-Dec-2023
  • (2023)User story clustering in agile development: a framework and an empirical studyFrontiers of Computer Science10.1007/s11704-022-8262-917:6Online publication date: 21-Jan-2023
  • (2022)Short Text Clustering Algorithms, Application and Challenges: A SurveyApplied Sciences10.3390/app1301034213:1(342)Online publication date: 27-Dec-2022
  • (2022)The overdose epidemic: a study protocol to determine whether people who use drugs can influence or shape public opinion via mass mediaHealth & Justice10.1186/s40352-022-00189-310:1Online publication date: 23-Jul-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media