ABSTRACT
A large number of news articles are generated every day on the Web. Automatically identifying events from a large document collection is a challenging problem. In this paper, we propose two event detection approaches using generative models. We combine the popular LDA model with temporal segmentation and spatial clustering. In addition, we adapt an image segmentation model, SLDA, for spatial-temporal event detection on text. The results of our experiments show that both approaches outperform the traditional content-based clustering approaches on our datasets.
- M.S. Aldenderfer and R.K Blashfield. Cluster Analysis. Newbury Park (CA): Sage, 1984.Google Scholar
- J. Allan, J. Carbonell, G. Doddington, J. Yamron, and Y. Yang. Topic detection and tracking pilot study: Final report, 1998.Google Scholar
- James Allan, Ao Feng, and Alvaro Bolivar. Flexible intrinsic evaluation of hierarchical clustering for tdt. In CIKM '03: Proceedings of the twelfth international conference on Information and knowledge management, pages 263--270, New York, NY, USA, 2003. ACM. Google ScholarDigital Library
- James Allan, Rahul Gupta, and Vikas Khandelwal. Temporal summaries of new topics. In SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages 10--18, New York, NY, USA, 2001. ACM Press. Google ScholarDigital Library
- Arindam Banerjee and Sugato Basu. Topic models over text streams: A study of batch and online unsupervised learning. In Proceedings of the SIAM International Conference on Data Mining (SDM-2007). SIAM, 2007.Google ScholarCross Ref
- A. Basu, I.R. Harris, and S. Basu. G.S. Maddala and C.R. Rao, editors, Handbook of Statistics, volume 15, chapter Minimum distance estimation: The approach using density-based distances, pages 21--48. North-Holland, 1997.Google Scholar
- D. Blei, T. Gri, M. Jordan, and J. Tenenbaum. Hierarchical topic models and the nested chinese restaurant process. 2003.Google Scholar
- David M. Blei and Michael I. Jordan. Variational inference for dirichlet process mixtures. Bayesian Analysis, 1:121--144, 2005.Google ScholarCross Ref
- David M. Blei, Andrew Y. Ng, and Michael I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993--1022, January 2003. Google ScholarDigital Library
- Liangliang Cao and Fei-Fei Li. Spatially coherent latent topic model for concurrent segmentation and classification of objects and scenes. In Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on, pages 1--8, 2007.Google ScholarCross Ref
- Ao Feng and James Allan. Finding and linking incidents in news. In CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, pages 821--830, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- Jonathan G. Fiscus and George R. Doddington. Topic detection and tracking evaluation overview. pages 17--31, 2002.Google Scholar
- David Graff, Chris Cieri, Stephanie Strassel, and Nii Martey. The tdt-3 text and speech corpus. In in Proceedings of DARPA Broadcast News Workshop, pages 57--60. Morgan Kaufmann, 1999.Google Scholar
- T. L. Griffiths and M. Steyvers. Finding scientific topics. Proc Natl Acad Sci U S A, 101 Suppl 1:5228--5235, April 2004.Google ScholarCross Ref
- Qi He, Kuiyu Chang, and Ee-Peng Lim. Anticipatory event detection via sentence classification. In Systems, Man and Cybernetics, 2006. SMC '06. IEEE International Conference on, volume 2, pages 1143--1148, 2006.Google ScholarCross Ref
- Qi He, Kuiyu Chang, and Ee-Peng Lim. Using burstiness to improve clustering of topics in news streams. In Data Mining, 2007. ICDM 2007. Seventh IEEE International Conference on, pages 493--498, 2007. Google ScholarDigital Library
- June-Jei Kuo and Hsin-Hsi Chen. Event clustering on streaming news using co-reference chains and event words. In Proceedings of the ACL workshop on coreference and its applications, pages 17--23, 2004.Google Scholar
- June-Jei Kuo and Hsin-Hsi Chen. Cross-document event clustering using knowledge mining from co-reference chains. Information Processing and Management, 43(2):327--343, March 2007. Google ScholarDigital Library
- Daniel Lemire. A better alternative to piecewise linear time series segmentation, May 2007.Google Scholar
- Fei-Fei Li and Pietro Perona. A bayesian hierarchical model for learning natural scene categories. In CVPR '05: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2, pages 524--531, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarDigital Library
- Juha Makkonen. Investigations on event evolution in tdt. In NAACL '03: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pages 43--48, Morristown, NJ, USA, 2003. Association for Computational Linguistics. Google ScholarDigital Library
- Juha Makkonen, Helena Ahonen-Myka, and Marko Salmenkivi. Applying semantic classes in event detection and tracking. In Rajeev Sangal and S. M. Bendre, editors, Proceedings of International Conference on Natural Language Processing (ICON 2002), pages 175--183, Mumbai, India, 2002.Google Scholar
- Ramesh Nallapati, Ao Feng, Fuchun Peng, and James Allan. Event threading within news topics. In CIKM '04: Proceedings of the Thirteenth ACM conference on Information and knowledge management, pages 446--453, New York, NY, USA, 2004. ACM Press. Google ScholarDigital Library
- Michal Rosen-Zvi, Thomas Griffiths, Mark Steyvers, and Padhraic Smyth. The author-topic model for authors and documents. In AUAI '04: Proceedings of the 20th conference on Uncertainty in artificial intelligence, pages 487--494, Arlington, Virginia, United States, 2004. AUAI Press. Google ScholarDigital Library
- Bryan C. Russell, William T. Freeman, Alexei A. Efros, Josef Sivic, and Andrew Zisserman. Using multiple segmentations to discover objects and their extent in image collections. In CVPR '06: Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 1605--1614, Washington, DC, USA, 2006. IEEE Computer Society. Google ScholarDigital Library
- J. Sivic, B. C. Russell, A. A. Efros, A. Zisserman, and W. T. Freeman. Discovering object categories in image collections. In Proceedings of the International Conference on Computer Vision, 2005. Google ScholarDigital Library
- Yang Song, Jian Huang, Isaac G. Councill, Jia Li, and Lee C. Giles. Efficient topic-based unsupervised name disambiguation. In JCDL '07: Proceedings of the 2007 conference on Digital libraries, pages 342--351, New York, NY, USA, 2007. ACM Press. Google ScholarDigital Library
- Aixin Sun and Ee-Peng Lim. Hierarchical text classification and evaluation. In ICDM '01: Proceedings of the 2001 IEEE International Conference on Data Mining, pages 521--528. IEEE Computer Society, 2001. Google ScholarDigital Library
- Ian Turton, Mark Gahegan, and Anuj R. Jaiswal. Geographic information retrieval from disparate data sources. In Geocomputation, volume 101 Suppl 1, 2007.Google Scholar
- Xiaogang Wang and Eric Grimson. Spatial latent dirichlet allocation. In Proceedings of Neural Information Processing Systems Conference (NIPS) 2007, 2007.Google Scholar
- Charles L. Wayne. Multilingual topic detection and tracking: Successful research enabled by corpora and evaluation. In 2nd International Conference on Language Resources & Evaluation (LREC 2000), 2000.Google Scholar
- Yiming Yang, Tom Pierce, and Jaime Carbonell. A study of retrospective and on-line event detection. In SIGIR '98: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 28--36, New York, NY, USA, 1998. ACM. Google ScholarDigital Library
- Qiankun Zhao, Prasenjit Mitra, and Bi Chen. Temporal and information flow based event detection from social text streams. In Proceedings of the 22nd national conference on Artificial intelligence - Volume 2, pages 1501--1506. AAAI Press, 2007. Google ScholarDigital Library
Index Terms
- Event detection with spatial latent Dirichlet allocation
Recommendations
Latent dirichlet allocation based multi-document summarization
AND '08: Proceedings of the second workshop on Analytics for noisy unstructured text dataExtraction based Multi-Document Summarization Algorithms consist of choosing sentences from the documents using some weighting mechanism and combining them into a summary. In this article we use Latent Dirichlet Allocation to capture the events being ...
Spatial Latent Dirichlet Allocation
NIPS'07: Proceedings of the 20th International Conference on Neural Information Processing SystemsIn recent years, the language model Latent Dirichlet Allocation (LDA), which clusters co-occurring words into topics, has been widely applied in the computer vision field. However, many of these applications have difficulty with modeling the spatial and ...
Obtaining single document summaries using latent dirichlet allocation
ICONIP'12: Proceedings of the 19th international conference on Neural Information Processing - Volume Part IVIn this paper, we present a novel approach that makes use of topic models based on Latent Dirichlet allocation(LDA) for generating single document summaries. Our approach is distinguished from other LDA based approaches in that we identify the summary ...
Comments