skip to main content
10.1145/1008992.1009044acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Text classification and named entities for new event detection

Published: 25 July 2004 Publication History

Abstract

New Event Detection is a challenging task that still offers scope for great improvement after years of effort. In this paper we show how performance on New Event Detection (NED) can be improved by the use of text classification techniques as well as by using named entities in a new way. We explore modifications to the document representation in a vector space-based NED system. We also show that addressing named entities preferentially is useful only in certain situations. A combination of all the above results in a multi-stage NED system that performs much better than baseline single-stage NED systems.

References

[1]
The linguistic data consortium, http://www ldc upenn.edu/
[2]
Rules of interpretation, http://www.ldc.upenn.edu/projects/tdt4/annotation/
[3]
Tdt 2001 evaluations, http://www.nist.gov/speech/tests/tdt/tdt2001/index htm
[4]
In Topic Detection and Tracking. Event-based Information Organization. Kluwer Academic Publishers, 2002.
[5]
J. Allan, H. Jin, M.Rajman, C. Wayne, G. D., L. V., R. Hoberman, and D. Caputo Summer workshop final report. In Center for Language and Speech Processing, 1999.
[6]
J. Allan, V. Lavrenko, and H. Jin. First story detection in tdt is hard. In Proceedings of the Ninth International Conference on Information and Knowledge Management, pages 374--381, 2000.
[7]
D. M. Bikel, R. L. Schwartz, and R. M. Weischedel An algorithm that learns what's in a name. Machine Learning, 34(1-3):211--231, 1999.
[8]
T. Brants, F. Chen, and A. Farahat. A system for new event detection In Proceedings of ACM SIGIR2003, pages 330--337, 2003.
[9]
J. P. Callan, W. B. Croft, and S. M. Harding. The INQUERY retrieval system. In Proceedings of DEXA-92, 3rd International Conference on Database and Expert Systems Applications, pages 78--83, 1992.
[10]
R. Krovetz. Viewing morphology as an inference process. In Proceedings of ACM SIGIR93, pages 61--81, 1998.
[11]
R. Papka and J. Allan On-line new event detection using single pass clustering TITLE2:. Technical Report UM-CS-1998-021, 1998.
[12]
R. E. Schapire and Y. Singer. Boostexter: A boosting-based system for text categorization In Machine Learning 39(2/3):1, pages 35--168. Kluwer Academic Publishers, 2000.
[13]
N. Stokes and J. Carthy. First story detection using a composite document representation. In Proceedings of Human Language Technology Conference, 2001.
[14]
Y. Yang, J. Zhang, J. Carbonell, and C. Jin. Topic-conditioned novelty detection. In Proceedings of ACM SIGKDD03.

Cited By

View all
  • (2024)Global News Synchrony and Diversity During the Start of the COVID-19 PandemicProceedings of the ACM Web Conference 202410.1145/3589334.3645645(2639-2650)Online publication date: 13-May-2024
  • (2023)Experimental Study of Morphological Analyzers for Topic Categorization in News ArticlesApplied Sciences10.3390/app13191057213:19(10572)Online publication date: 22-Sep-2023
  • (2023)A two-layer BiLSTM model with linear gating for Chinese named entity recognition2023 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN54540.2023.10191631(1-8)Online publication date: 18-Jun-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
July 2004
624 pages
ISBN:1581138814
DOI:10.1145/1008992
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 July 2004

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. named entities
  2. new event detection
  3. text classification
  4. topic detection and tracking

Qualifiers

  • Article

Conference

SIGIR04
Sponsor:

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)20
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Global News Synchrony and Diversity During the Start of the COVID-19 PandemicProceedings of the ACM Web Conference 202410.1145/3589334.3645645(2639-2650)Online publication date: 13-May-2024
  • (2023)Experimental Study of Morphological Analyzers for Topic Categorization in News ArticlesApplied Sciences10.3390/app13191057213:19(10572)Online publication date: 22-Sep-2023
  • (2023)A two-layer BiLSTM model with linear gating for Chinese named entity recognition2023 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN54540.2023.10191631(1-8)Online publication date: 18-Jun-2023
  • (2023)Filter feature selection methods for text classification: a reviewMultimedia Tools and Applications10.1007/s11042-023-15675-583:1(2053-2091)Online publication date: 11-May-2023
  • (2022)Topic Detection and Tracking Towards Determining Public Agenda ItemsMachine Learning for Societal Improvement, Modernization, and Progress10.4018/978-1-6684-4045-2.ch008(158-179)Online publication date: 24-Jun-2022
  • (2022)Multi-Modal Topic Model Based on Word Rank and Relevance Semantic for Social Events ClassificationJournal of Computer-Aided Design & Computer Graphics10.3724/SP.J.1089.2022.1974634:10(1477-1488)Online publication date: 29-Dec-2022
  • (2022)GeoClustExpert Systems with Applications: An International Journal10.1016/j.eswa.2022.118461210:COnline publication date: 30-Dec-2022
  • (2022)Event prediction from news text using subgraph embedding and graph sequence miningWorld Wide Web10.1007/s11280-021-01002-125:6(2403-2428)Online publication date: 28-Feb-2022
  • (2022)A novel framework for multiclass supervised classification of location-sensitive eventsMultimedia Tools and Applications10.1007/s11042-021-11842-882:7(9667-9692)Online publication date: 16-Feb-2022
  • (2022)Word-level human interpretable scoring mechanism for novel text detection using Tsetlin MachinesApplied Intelligence10.1007/s10489-022-03281-152:15(17465-17489)Online publication date: 2-Apr-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media