skip to main content
10.1145/1277741.1277779acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Analyzing feature trajectories for event detection

Published:23 July 2007Publication History

ABSTRACT

We consider the problem of analyzing word trajectories in both time and frequency domains, with the specific goal of identifying important and less-reported, periodic and aperiodic words. A set of words with identical trends can be grouped together to reconstruct an event in a completely un-supervised manner. The document frequency of each word across time is treated like a time series, where each element is the document frequency - inverse document frequency (DFIDF) score at one time point. In this paper, we 1) first applied spectral analysis to categorize features for different event characteristics: important and less-reported, periodic and aperiodic; 2) modeled aperiodic features with Gaussian density and periodic features with Gaussian mixture densities, and subsequently detected each feature's burst by the truncated Gaussian approach; 3) proposed an unsupervised greedy event detection algorithm to detect both aperiodic and periodic events. All of the above methods can be applied to time series data in general. We extensively evaluated our methods on the 1-year Reuters News Corpus [3] and showed that they were able to uncover meaningful aperiodic and periodic events.

References

  1. Apache lucene-core 2.0.0, http://lucene.apache.org.Google ScholarGoogle Scholar
  2. Google news alerts, http://www.google.com/alerts.Google ScholarGoogle Scholar
  3. Reuters corpus, http://www.reuters.com/researchandstandards/corpus/.Google ScholarGoogle Scholar
  4. J. Allan. Topic Detection and Tracking. Event-based Information Organization. Kluwer Academic Publishers, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Allan, V. Lavrenko, and H. Jin. First story detection in tdt is hard. In CIKM, pages 374--381, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Allan, C. Wade, and A. Bolivar. Retrieval and novelty detection at the sentence level. In SIGIR, pages 314--321, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. Brants, F. Chen, and A. Farahat. A system for new event detection. In SIGIR, pages 330--337, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39(1):1--38, 1977.Google ScholarGoogle Scholar
  9. G. P. C. Fung, J. X. Yu, P. S. Yu, and H. Lu. Parameter free bursty events detection in text streams. In VLDB, pages 181--192, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Q. He, K. Chang, and E.-P. Lim. A model for anticipatory event detection. In ER, pages 168--181, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Q. He, K. Chang, E.-P. Lim, and J. Zhang. Bursty feature reprensentation for clustering text streams. In SDM, accepted, 2007.Google ScholarGoogle Scholar
  12. J. Kleinberg. Bursty and hierarchical structure in streams. In SIGKDD, pages 91--101, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Kumar, J. Novak, P. Raghavan, and A. Tomkins. On the bursty evolution of blogspace. In WWW, pages 159--178, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. G. Kumaran and J. Allan. Text classification and named entities for new event detection. In SIGIR, pages 297--304, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Q. Mei and C. Zhai. Discovering evolutionary theme patterns from text: an exploration of temporal text mining. In SIGKDD, pages 198--207, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. W. D. Penny. Kullback-liebler divergences of normal, gamma, dirichlet and wishart densities. Technical report, 2001.Google ScholarGoogle Scholar
  17. N. Stokes and J. Carthy. Combining semantic and syntactic document classifiers to improve first story detection. In SIGIR, pages 424--425, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. Swan and J. Allan. Automatic generation of overview timelines. In SIGIR, pages 49--56, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Vlachos, C. Meek, Z. Vagena, and D. Gunopulos. Identifying similarities, periodicities and bursts for online search queries. In SIGMOD, pages 131--142, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Y. Yang, T. Pierce, and J. Carbonell. A study of retrospective and on-line event detection. In SIGIR, pages 28--36, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Y. Yang, J. Zhang, J. Carbonell, and C. Jin. Topic-conditioned novelty detection. In SIGKDD, pages 688--693, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Analyzing feature trajectories for event detection

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
      July 2007
      946 pages
      ISBN:9781595935977
      DOI:10.1145/1277741

      Copyright © 2007 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 23 July 2007

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate792of3,983submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader