skip to main content
10.1145/2661829.2661984acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

A Fresh Look on Knowledge Bases: Distilling Named Events from News

Published:03 November 2014Publication History

ABSTRACT

Knowledge bases capture millions of entities such as people, companies or movies. However, their knowledge of named events like sports finals, political scandals, or natural disasters is fairly limited, as these are continuously emerging entities. This paper presents a method for extracting named events from news articles, reconciling them into canonicalized representation, and organizing them into fine-grained semantic classes to populate a knowledge base. Our method captures similarity measures among news articles in a multi-view attributed graph, considering textual contents, entity occurrences, and temporal ordering. For distilling canonicalized events from this raw data, we present a novel graph coarsening algorithm based on the information-theoretic principle of minimum description length. The quality of our method is experimentally demonstrated by extracting, organizing, and evaluating 25,000 events from a corpus of 300,000 heterogeneous news articles.

References

  1. F. M. Suchanek, et al. Yago: A Core of Semantic Knowledge. WWW, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. K. Agarwal, et al. Real Time Discovery of Dense Clusters in Highly Dynamic Graphs: Identifying Real World Events in Highly Dynamic Environments. PVLDB, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Angel, et al. Dense Subgraph Maintenance under Streaming Edge Weight Updates for Real-time Story Identification. PVLDB, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Auer, et al. DBpedia: A Nucleus for a Web of Open Data. ISWC/ASWC, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Das Sarma, et al. Dynamic Relationship and Event Discovery. WSDM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Q. Do, et al. Joint Inference for Event Timeline Construction. EMNLP-CoNLL, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. R. Finkel, et al. Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling. ACL, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. E. Gabrilovich, et al. Overcoming the Brittleness Bottleneck Using Wikipedia: Enhancing Text Categorization with Encyclopedic Knowledge. AAAI, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. P. D. Grünwald. The Minimum Description Length Principle. MIT Press, 2007.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. X. Hu, et al. Exploiting Wikipedia as External Knowledge for Document Clustering. SIGKDD, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Hoffart, et al. Yago2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia. Artificial Intelligence, Vol. 194, p:28--61, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. G. Karypis, et al. Multilevel Graph Partitioning Schemes. ICPP, Vol. 3, p:113--122, 1995.Google ScholarGoogle Scholar
  13. S. Kirkpatrick, et al. Optimization by Simulated Annealing. Science, Vol. 220(4598), p:671--680, 1983.Google ScholarGoogle Scholar
  14. E. Kuzey, et al. Extraction of Temporal Facts and Events from Wikipedia. TempWeb Workshop, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. W. Lu, et al. Automatic Event Extraction with Structured Preference Modeling. ACL, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Navlakha, et al. Graph Summarization with Bounded Error. SIGMOD, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. I. Safro, et al. Advanced Coarsening Schemes for Graph Partitioning. SEA, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. Shahaf, et al. Connecting the Dots Between News Articles. KDD, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Silva, et al. Mining Attribute-Structure Correlated Patterns in Large Attributed Graphs. PVLDB, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Y. Tian, et al. Efficient Aggregation for Graph Summarization. SIGMOD, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Wang, et al. Generating Pictorial Storylines Via Minimum-Weight Connected Dominating Set Approximation in Multi-View Graphs. AAAI, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. P. Wang, et al. Using Wikipedia Knowledge to Improve Text Classification. KAIS, Vol. 19 (3), p:265--281, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. R. Yan, et al. Evolutionary Timeline Summarization: A Balanced Optimization Framework via Iterative Substitution. SIGIR, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. C. Zhai. Statistical Language Models for Information Retrieval. Morgan & Claypool Publishers, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Y. Zhou, et al. Graph Clustering Based on Structural/Attribute Similarities. PVLDB, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Fresh Look on Knowledge Bases: Distilling Named Events from News

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management
        November 2014
        2152 pages
        ISBN:9781450325981
        DOI:10.1145/2661829

        Copyright © 2014 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 3 November 2014

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        CIKM '14 Paper Acceptance Rate175of838submissions,21%Overall Acceptance Rate1,861of8,427submissions,22%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader