skip to main content
10.1145/1998076.1998141acmconferencesArticle/Chapter ViewAbstractPublication PagesjcdlConference Proceedingsconference-collections
research-article

Event detection with spatial latent Dirichlet allocation

Authors Info & Claims
Published:13 June 2011Publication History

ABSTRACT

A large number of news articles are generated every day on the Web. Automatically identifying events from a large document collection is a challenging problem. In this paper, we propose two event detection approaches using generative models. We combine the popular LDA model with temporal segmentation and spatial clustering. In addition, we adapt an image segmentation model, SLDA, for spatial-temporal event detection on text. The results of our experiments show that both approaches outperform the traditional content-based clustering approaches on our datasets.

References

  1. M.S. Aldenderfer and R.K Blashfield. Cluster Analysis. Newbury Park (CA): Sage, 1984.Google ScholarGoogle Scholar
  2. J. Allan, J. Carbonell, G. Doddington, J. Yamron, and Y. Yang. Topic detection and tracking pilot study: Final report, 1998.Google ScholarGoogle Scholar
  3. James Allan, Ao Feng, and Alvaro Bolivar. Flexible intrinsic evaluation of hierarchical clustering for tdt. In CIKM '03: Proceedings of the twelfth international conference on Information and knowledge management, pages 263--270, New York, NY, USA, 2003. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. James Allan, Rahul Gupta, and Vikas Khandelwal. Temporal summaries of new topics. In SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages 10--18, New York, NY, USA, 2001. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Arindam Banerjee and Sugato Basu. Topic models over text streams: A study of batch and online unsupervised learning. In Proceedings of the SIAM International Conference on Data Mining (SDM-2007). SIAM, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  6. A. Basu, I.R. Harris, and S. Basu. G.S. Maddala and C.R. Rao, editors, Handbook of Statistics, volume 15, chapter Minimum distance estimation: The approach using density-based distances, pages 21--48. North-Holland, 1997.Google ScholarGoogle Scholar
  7. D. Blei, T. Gri, M. Jordan, and J. Tenenbaum. Hierarchical topic models and the nested chinese restaurant process. 2003.Google ScholarGoogle Scholar
  8. David M. Blei and Michael I. Jordan. Variational inference for dirichlet process mixtures. Bayesian Analysis, 1:121--144, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  9. David M. Blei, Andrew Y. Ng, and Michael I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993--1022, January 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Liangliang Cao and Fei-Fei Li. Spatially coherent latent topic model for concurrent segmentation and classification of objects and scenes. In Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on, pages 1--8, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  11. Ao Feng and James Allan. Finding and linking incidents in news. In CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, pages 821--830, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Jonathan G. Fiscus and George R. Doddington. Topic detection and tracking evaluation overview. pages 17--31, 2002.Google ScholarGoogle Scholar
  13. David Graff, Chris Cieri, Stephanie Strassel, and Nii Martey. The tdt-3 text and speech corpus. In in Proceedings of DARPA Broadcast News Workshop, pages 57--60. Morgan Kaufmann, 1999.Google ScholarGoogle Scholar
  14. T. L. Griffiths and M. Steyvers. Finding scientific topics. Proc Natl Acad Sci U S A, 101 Suppl 1:5228--5235, April 2004.Google ScholarGoogle ScholarCross RefCross Ref
  15. Qi He, Kuiyu Chang, and Ee-Peng Lim. Anticipatory event detection via sentence classification. In Systems, Man and Cybernetics, 2006. SMC '06. IEEE International Conference on, volume 2, pages 1143--1148, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  16. Qi He, Kuiyu Chang, and Ee-Peng Lim. Using burstiness to improve clustering of topics in news streams. In Data Mining, 2007. ICDM 2007. Seventh IEEE International Conference on, pages 493--498, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. June-Jei Kuo and Hsin-Hsi Chen. Event clustering on streaming news using co-reference chains and event words. In Proceedings of the ACL workshop on coreference and its applications, pages 17--23, 2004.Google ScholarGoogle Scholar
  18. June-Jei Kuo and Hsin-Hsi Chen. Cross-document event clustering using knowledge mining from co-reference chains. Information Processing and Management, 43(2):327--343, March 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Daniel Lemire. A better alternative to piecewise linear time series segmentation, May 2007.Google ScholarGoogle Scholar
  20. Fei-Fei Li and Pietro Perona. A bayesian hierarchical model for learning natural scene categories. In CVPR '05: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2, pages 524--531, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Juha Makkonen. Investigations on event evolution in tdt. In NAACL '03: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pages 43--48, Morristown, NJ, USA, 2003. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Juha Makkonen, Helena Ahonen-Myka, and Marko Salmenkivi. Applying semantic classes in event detection and tracking. In Rajeev Sangal and S. M. Bendre, editors, Proceedings of International Conference on Natural Language Processing (ICON 2002), pages 175--183, Mumbai, India, 2002.Google ScholarGoogle Scholar
  23. Ramesh Nallapati, Ao Feng, Fuchun Peng, and James Allan. Event threading within news topics. In CIKM '04: Proceedings of the Thirteenth ACM conference on Information and knowledge management, pages 446--453, New York, NY, USA, 2004. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Michal Rosen-Zvi, Thomas Griffiths, Mark Steyvers, and Padhraic Smyth. The author-topic model for authors and documents. In AUAI '04: Proceedings of the 20th conference on Uncertainty in artificial intelligence, pages 487--494, Arlington, Virginia, United States, 2004. AUAI Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Bryan C. Russell, William T. Freeman, Alexei A. Efros, Josef Sivic, and Andrew Zisserman. Using multiple segmentations to discover objects and their extent in image collections. In CVPR '06: Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 1605--1614, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Sivic, B. C. Russell, A. A. Efros, A. Zisserman, and W. T. Freeman. Discovering object categories in image collections. In Proceedings of the International Conference on Computer Vision, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Yang Song, Jian Huang, Isaac G. Councill, Jia Li, and Lee C. Giles. Efficient topic-based unsupervised name disambiguation. In JCDL '07: Proceedings of the 2007 conference on Digital libraries, pages 342--351, New York, NY, USA, 2007. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Aixin Sun and Ee-Peng Lim. Hierarchical text classification and evaluation. In ICDM '01: Proceedings of the 2001 IEEE International Conference on Data Mining, pages 521--528. IEEE Computer Society, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Ian Turton, Mark Gahegan, and Anuj R. Jaiswal. Geographic information retrieval from disparate data sources. In Geocomputation, volume 101 Suppl 1, 2007.Google ScholarGoogle Scholar
  30. Xiaogang Wang and Eric Grimson. Spatial latent dirichlet allocation. In Proceedings of Neural Information Processing Systems Conference (NIPS) 2007, 2007.Google ScholarGoogle Scholar
  31. Charles L. Wayne. Multilingual topic detection and tracking: Successful research enabled by corpora and evaluation. In 2nd International Conference on Language Resources & Evaluation (LREC 2000), 2000.Google ScholarGoogle Scholar
  32. Yiming Yang, Tom Pierce, and Jaime Carbonell. A study of retrospective and on-line event detection. In SIGIR '98: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 28--36, New York, NY, USA, 1998. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Qiankun Zhao, Prasenjit Mitra, and Bi Chen. Temporal and information flow based event detection from social text streams. In Proceedings of the 22nd national conference on Artificial intelligence - Volume 2, pages 1501--1506. AAAI Press, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Event detection with spatial latent Dirichlet allocation

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        JCDL '11: Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
        June 2011
        500 pages
        ISBN:9781450307444
        DOI:10.1145/1998076

        Copyright © 2011 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 13 June 2011

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate415of1,482submissions,28%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader