skip to main content
10.1145/1529282.1529672acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

A sentence level probabilistic model for evolutionary theme pattern mining from news corpora

Published: 08 March 2009 Publication History

Abstract

Some recent topic model-based methods have been proposed to discover and summarize the evolutionary patterns of themes in temporal text collections. However, the theme patterns extracted by these methods are hard to interpret and evaluate. To produce a more descriptive representation of the theme pattern, we not only give new representations of sentences and themes with named entities, but we also propose a sentence-level probabilistic model based on the new representation pattern. Compared with other topic model methods, our approach not only gets each topic's distribution per term, but also generates candidate summary sentences of the themes as well. Consequently, the results are easier to understand and can be evaluated using the top sentences produced by our probabilistic model. Experimentation with the proposed methods on the Tsunami dataset shows that the proposed methods are useful in the discovery of evolutionary theme patterns.

References

[1]
J. Allan, R. Gupta, and V. Khandelwal. Temporal summaries of news topics. In Proceedings of ACM SIGIR 2001, pages 10--18, 2001.
[2]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993--1022, 2003.
[3]
T. Hofmann. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pages 50--57, 1999.
[4]
T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley, 1991.
[5]
C. Zhai, A. Velivelli, and B. Yu. A cross-collection mixture model for comparative text mining. In Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining, pages 743--748, 2004.
[6]
D. M. Blei and J. D. Lafferty. Dynamic topic models. In Proceedings of the 23rd International Conference on Machine Learning, 2006.
[7]
T. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of the National Academy of Sciences, 101(suppl. 1):5228--5235, 2004.
[8]
K. Nigam, A. McCallum, S. Thrun, and T. Mitchell. Text classification from labeled and unlabeled documents using em. Machine Learning, 2000.
[9]
Q. Mei and C. Zhai. Discovering evolutionary theme patterns from text - an exploration of temporal text mining. In Proceedings of the 11th International Conference on Knowledge Discovery and Data Mining (KDD'05), 2005.
[10]
X. Wang and A. McCallum. Topics over time: A non-markov continuous-time model of topical trends. In Proceedings of the 12th International Conference on Knowledge Discovery and Data Mining (KDD'06), 2006.
[11]
S. Morinaga and K. Yamanishi. Tracking dynamics of topic trends using a finite mixture model. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'04), 2004.
[12]
R. Kumar, U. Mahadevan, and D. Sivakumar. A graph-theoretic approach to extract storylines from search results. In Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining, pages 216--225, 2004.
[13]
Z. Li, B. W. anad Mingjing Li, and W. Y. Ma. A probabilistic model for retrospective news event detection. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'05), 2005.
[14]
Y. Yang, T. Pierce, and J. Carbonell. A study on retrospective and on-line event detection. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'98), 1998.
[15]
R. Feldman and I. Dagan. Knowledge discovery in textual databases (kdt). In KDD, pages 112--117, 1995.
[16]
M. A. Hearst. Untangling text data mining. In Proceedings of the 37th conference on Association for Computational Linguistics (ACL 1999), pages 3--10, 1999.
[17]
J. Kleinberg. Bursty and hierarchical structure in streams. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 91--101, 2002.
[18]
A. Kontostathis, L. Galitsky, W. M. Pottenger, S. Roy, and D. J. Phelps. A survey of emerging trend detection in textual data mining. Survey of Text Mining, pages 185--224, 2003.
[19]
S. Morinaga and K. Yamanishi. Tracking dynamics of topic trends using a finite mixture model. In Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining, pages 811--816, 2004.
[20]
S. Roy, D. Gevry, and W. M. Pottenger. Methodologies for trend detection in textual data mining. In the Textmine '02 Workshop, Second SIAM International Conference on Data Mining, 2002.
[21]
Alias-I, "LingPipe," Website, 9 2006. {Online}. Available: http://www.alias-i.com/lingpipe/index.html
[22]
Topic detection and tracking (tdt) project. homepage: http://www.nist.gov/speech/tests/tdt/.
[23]
J. Allan, R. Papka, and V. Lavrenko. On-line new event detection and tracking. In Proc. of SIGIR Conference on Research and Development in Information Retrieval, 1998.
[24]
G. Kumaran and J. Allan. Text classification and named entities for new event detection. In Proc. of the SIGIR Conference on Research and Development in Information Retrieval, 2004.
[25]
G. Fung, J. Yu, H. Liu, P. Yu. Time-dependent event hierarchy construction. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'07), 2007.
[26]
Nadeau D. and Sekine S. A Survey of Named Entity Recognition and Classification. In: Sekine, S. and Ranchhod, E. Named Entities: Recognition, classification and use. Special issue of Linguistics Investigationes. 30(1) pp. 3--26.

Cited By

View all
  • (2020)A computational framework for social-media-based business analytics and knowledge creation: empirical studies of CyTraSSEnterprise Information Systems10.1080/17517575.2020.1827299(1-23)Online publication date: 27-Oct-2020
  • (2011)Mining event temporal boundaries from news corpora through evolution phase discoveryProceedings of the 12th international conference on Web-age information management10.5555/2035562.2035625(554-565)Online publication date: 14-Sep-2011
  • (2011)DVDProceedings of the 13th Asia-Pacific web conference on Web technologies and applications10.5555/1996794.1996817(168-180)Online publication date: 18-Apr-2011
  • Show More Cited By

Index Terms

  1. A sentence level probabilistic model for evolutionary theme pattern mining from news corpora

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SAC '09: Proceedings of the 2009 ACM symposium on Applied Computing
    March 2009
    2347 pages
    ISBN:9781605581668
    DOI:10.1145/1529282
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 March 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. evolutionary theme patterns
    2. named entities
    3. temporal text mining
    4. topic model

    Qualifiers

    • Research-article

    Conference

    SAC09
    Sponsor:
    SAC09: The 2009 ACM Symposium on Applied Computing
    March 8, 2009 - March 12, 2008
    Hawaii, Honolulu

    Acceptance Rates

    Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

    Upcoming Conference

    SAC '25
    The 40th ACM/SIGAPP Symposium on Applied Computing
    March 31 - April 4, 2025
    Catania , Italy

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 18 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)A computational framework for social-media-based business analytics and knowledge creation: empirical studies of CyTraSSEnterprise Information Systems10.1080/17517575.2020.1827299(1-23)Online publication date: 27-Oct-2020
    • (2011)Mining event temporal boundaries from news corpora through evolution phase discoveryProceedings of the 12th international conference on Web-age information management10.5555/2035562.2035625(554-565)Online publication date: 14-Sep-2011
    • (2011)DVDProceedings of the 13th Asia-Pacific web conference on Web technologies and applications10.5555/1996794.1996817(168-180)Online publication date: 18-Apr-2011
    • (2011)Mining Evolutionary Topic Patterns in Community Question Answering SystemsIEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans10.1109/TSMCA.2011.215713141:5(828-833)Online publication date: 1-Sep-2011
    • (2011)Mining Event Temporal Boundaries from News Corpora through Evolution Phase DiscoveryWeb-Age Information Management10.1007/978-3-642-23535-1_47(554-565)Online publication date: 2011
    • (2011)DVD: A Model for Event Diversified Versions DiscoveryWeb Technologies and Applications10.1007/978-3-642-20291-9_18(168-180)Online publication date: 2011
    • (2010)Topic detection by topic model induced distance using biased initiationProceedings of the 2010 international conference on Advances in computer science and information technology10.5555/1875558.1875588(310-323)Online publication date: 23-Jun-2010
    • (2010)Topic Detection by Topic Model Induced Distance Using Biased InitiationAdvances in Computer Science and Information Technology10.1007/978-3-642-13577-4_27(310-323)Online publication date: 2010

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media