ABSTRACT
We consider the problem of analyzing word trajectories in both time and frequency domains, with the specific goal of identifying important and less-reported, periodic and aperiodic words. A set of words with identical trends can be grouped together to reconstruct an event in a completely un-supervised manner. The document frequency of each word across time is treated like a time series, where each element is the document frequency - inverse document frequency (DFIDF) score at one time point. In this paper, we 1) first applied spectral analysis to categorize features for different event characteristics: important and less-reported, periodic and aperiodic; 2) modeled aperiodic features with Gaussian density and periodic features with Gaussian mixture densities, and subsequently detected each feature's burst by the truncated Gaussian approach; 3) proposed an unsupervised greedy event detection algorithm to detect both aperiodic and periodic events. All of the above methods can be applied to time series data in general. We extensively evaluated our methods on the 1-year Reuters News Corpus [3] and showed that they were able to uncover meaningful aperiodic and periodic events.
- Apache lucene-core 2.0.0, http://lucene.apache.org.Google Scholar
- Google news alerts, http://www.google.com/alerts.Google Scholar
- Reuters corpus, http://www.reuters.com/researchandstandards/corpus/.Google Scholar
- J. Allan. Topic Detection and Tracking. Event-based Information Organization. Kluwer Academic Publishers, 2002. Google ScholarDigital Library
- J. Allan, V. Lavrenko, and H. Jin. First story detection in tdt is hard. In CIKM, pages 374--381, 2000. Google ScholarDigital Library
- J. Allan, C. Wade, and A. Bolivar. Retrieval and novelty detection at the sentence level. In SIGIR, pages 314--321, 2003. Google ScholarDigital Library
- T. Brants, F. Chen, and A. Farahat. A system for new event detection. In SIGIR, pages 330--337, 2003. Google ScholarDigital Library
- A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39(1):1--38, 1977.Google Scholar
- G. P. C. Fung, J. X. Yu, P. S. Yu, and H. Lu. Parameter free bursty events detection in text streams. In VLDB, pages 181--192, 2005. Google ScholarDigital Library
- Q. He, K. Chang, and E.-P. Lim. A model for anticipatory event detection. In ER, pages 168--181, 2006. Google ScholarDigital Library
- Q. He, K. Chang, E.-P. Lim, and J. Zhang. Bursty feature reprensentation for clustering text streams. In SDM, accepted, 2007.Google Scholar
- J. Kleinberg. Bursty and hierarchical structure in streams. In SIGKDD, pages 91--101, 2002. Google ScholarDigital Library
- R. Kumar, J. Novak, P. Raghavan, and A. Tomkins. On the bursty evolution of blogspace. In WWW, pages 159--178, 2005. Google ScholarDigital Library
- G. Kumaran and J. Allan. Text classification and named entities for new event detection. In SIGIR, pages 297--304, 2004. Google ScholarDigital Library
- Q. Mei and C. Zhai. Discovering evolutionary theme patterns from text: an exploration of temporal text mining. In SIGKDD, pages 198--207, 2005. Google ScholarDigital Library
- W. D. Penny. Kullback-liebler divergences of normal, gamma, dirichlet and wishart densities. Technical report, 2001.Google Scholar
- N. Stokes and J. Carthy. Combining semantic and syntactic document classifiers to improve first story detection. In SIGIR, pages 424--425, 2001. Google ScholarDigital Library
- R. Swan and J. Allan. Automatic generation of overview timelines. In SIGIR, pages 49--56, 2000. Google ScholarDigital Library
- M. Vlachos, C. Meek, Z. Vagena, and D. Gunopulos. Identifying similarities, periodicities and bursts for online search queries. In SIGMOD, pages 131--142, 2004. Google ScholarDigital Library
- Y. Yang, T. Pierce, and J. Carbonell. A study of retrospective and on-line event detection. In SIGIR, pages 28--36, 1998. Google ScholarDigital Library
- Y. Yang, J. Zhang, J. Carbonell, and C. Jin. Topic-conditioned novelty detection. In SIGKDD, pages 688--693, 2002. Google ScholarDigital Library
Index Terms
- Analyzing feature trajectories for event detection
Recommendations
Improving document-level event detection with event relation graph
Highlights- An event correlation-based document-level event detection model is proposed.
- An ...
AbstractThe correlation between events within the same document plays a crucial role in event detection. Most existing detection models often ignore event correlations, which is not applicable to multi-event detection at the document level. In ...
Improving Event Detection by Automatically Assessing Validity of Event Occurrence in Text
CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge ManagementManually inspecting text to assess whether an event occurs in a document collection is an onerous and time consuming task. Although a manual inspection to discard the false events would increase the precision of automatically detected sets of events, it ...
Context Event Features and Event Embedding Enhanced Event Detection
ACAI '20: Proceedings of the 2020 3rd International Conference on Algorithms, Computing and Artificial IntelligenceExtracting valuable information from text has always been a hot point for research and event detection is an essential subtask of information extraction. Most existing methods of event detection only focus on sentence-level information and do not ...
Comments