research-article

A Fresh Look on Knowledge Bases: Distilling Named Events from News

Authors:
Erdal Kuzey

Max Planck Institute, Saarbrücken, Germany

Max Planck Institute, Saarbrücken, Germany
View Profile

,
Jilles Vreeken

Max Planck Institute, Saarbrücken, Germany

Max Planck Institute, Saarbrücken, Germany
View Profile

,
Gerhard Weikum

Max Planck Institute, Saarbrücken, Germany

Max Planck Institute, Saarbrücken, Germany
View Profile

CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge ManagementNovember 2014Pages 1689–1698https://doi.org/10.1145/2661829.2661984

Published:03 November 2014Publication History

CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management

Pages 1689–1698

ABSTRACT

Knowledge bases capture millions of entities such as people, companies or movies. However, their knowledge of named events like sports finals, political scandals, or natural disasters is fairly limited, as these are continuously emerging entities. This paper presents a method for extracting named events from news articles, reconciling them into canonicalized representation, and organizing them into fine-grained semantic classes to populate a knowledge base. Our method captures similarity measures among news articles in a multi-view attributed graph, considering textual contents, entity occurrences, and temporal ordering. For distilling canonicalized events from this raw data, we present a novel graph coarsening algorithm based on the information-theoretic principle of minimum description length. The quality of our method is experimentally demonstrated by extracting, organizing, and evaluating 25,000 events from a corpus of 300,000 heterogeneous news articles.

References

F. M. Suchanek, et al. Yago: A Core of Semantic Knowledge. WWW, 2007. Google ScholarDigital Library
M. K. Agarwal, et al. Real Time Discovery of Dense Clusters in Highly Dynamic Graphs: Identifying Real World Events in Highly Dynamic Environments. PVLDB, 2012. Google ScholarDigital Library
A. Angel, et al. Dense Subgraph Maintenance under Streaming Edge Weight Updates for Real-time Story Identification. PVLDB, 2012. Google ScholarDigital Library
S. Auer, et al. DBpedia: A Nucleus for a Web of Open Data. ISWC/ASWC, 2007. Google ScholarDigital Library
A. Das Sarma, et al. Dynamic Relationship and Event Discovery. WSDM, 2011. Google ScholarDigital Library
Q. Do, et al. Joint Inference for Event Timeline Construction. EMNLP-CoNLL, 2012. Google ScholarDigital Library
J. R. Finkel, et al. Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling. ACL, 2005. Google ScholarDigital Library
E. Gabrilovich, et al. Overcoming the Brittleness Bottleneck Using Wikipedia: Enhancing Text Categorization with Encyclopedic Knowledge. AAAI, 2006. Google ScholarDigital Library
P. D. Grünwald. The Minimum Description Length Principle. MIT Press, 2007.Google ScholarDigital Library
X. Hu, et al. Exploiting Wikipedia as External Knowledge for Document Clustering. SIGKDD, 2009. Google ScholarDigital Library
J. Hoffart, et al. Yago2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia. Artificial Intelligence, Vol. 194, p:28--61, 2013. Google ScholarDigital Library
G. Karypis, et al. Multilevel Graph Partitioning Schemes. ICPP, Vol. 3, p:113--122, 1995.Google Scholar
S. Kirkpatrick, et al. Optimization by Simulated Annealing. Science, Vol. 220(4598), p:671--680, 1983.Google Scholar
E. Kuzey, et al. Extraction of Temporal Facts and Events from Wikipedia. TempWeb Workshop, 2012. Google ScholarDigital Library
W. Lu, et al. Automatic Event Extraction with Structured Preference Modeling. ACL, 2012. Google ScholarDigital Library
S. Navlakha, et al. Graph Summarization with Bounded Error. SIGMOD, 2008. Google ScholarDigital Library
I. Safro, et al. Advanced Coarsening Schemes for Graph Partitioning. SEA, 2012. Google ScholarDigital Library
D. Shahaf, et al. Connecting the Dots Between News Articles. KDD, 2010. Google ScholarDigital Library
A. Silva, et al. Mining Attribute-Structure Correlated Patterns in Large Attributed Graphs. PVLDB, 2012. Google ScholarDigital Library
Y. Tian, et al. Efficient Aggregation for Graph Summarization. SIGMOD, 2008. Google ScholarDigital Library
D. Wang, et al. Generating Pictorial Storylines Via Minimum-Weight Connected Dominating Set Approximation in Multi-View Graphs. AAAI, 2012.Google ScholarDigital Library
P. Wang, et al. Using Wikipedia Knowledge to Improve Text Classification. KAIS, Vol. 19 (3), p:265--281, 2009. Google ScholarDigital Library
R. Yan, et al. Evolutionary Timeline Summarization: A Balanced Optimization Framework via Iterative Substitution. SIGIR, 2011. Google ScholarDigital Library
C. Zhai. Statistical Language Models for Information Retrieval. Morgan & Claypool Publishers, 2008. Google ScholarDigital Library
Y. Zhou, et al. Graph Clustering Based on Structural/Attribute Similarities. PVLDB, 2009. Google ScholarDigital Library

Index Terms

A Fresh Look on Knowledge Bases: Distilling Named Events from News
1. Computing methodologies
  1. Modeling and simulation
    1. Simulation theory
      1. Systems theory
2. Mathematics of computing
  1. Information theory

Recommendations

Towards Vandalism Detection in Knowledge Bases: Corpus Construction and Analysis
SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

We report on the construction of the Wikidata Vandalism Corpus WDVC-2015, the first corpus for vandalism in knowledge bases. Our corpus is based on the entire revision history of Wikidata, the knowledge base underlying Wikipedia. Among Wikidata's 24 ...
Read More
Search-based entity disambiguation with document-centric knowledge bases
i-KNOW '15: Proceedings of the 15th International Conference on Knowledge Technologies and Data-driven Business

Entity disambiguation is the task of mapping ambiguous terms in natural-language text to its entities in a knowledge base. One possibility to describe these entities within a knowledge base is via entity-annotated documents (document-centric knowledge ...
Read More
Integration of large scale knowledge bases using probabilistic graphical models
WSDM '14: Proceedings of the 7th ACM international conference on Web search and data mining

Over the recent past, information extraction (IE) systems such as Nell and ReVerb have attained much success in creating large knowledge resources with minimal supervision. But, these resources in general, lack schema information and contain facts with ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management
November 2014
2152 pages
ISBN:9781450325981
DOI:10.1145/2661829
General Chairs:
Jianzhong Li
Harbin Inst. of Technology
,
X. Sean Wang
Fudan University
,
Program Chairs:
Minos Garofalakis
Technical University of Crete, Greece
,
Ian Soboroff
National Institute of Standards, USA
,
Torsten Suel
New York University, USA
,
Min Wang
Google Research, USA
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 November 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
event mining
information extraction
knowledge bases
minimum description length
temporal knowledge
Qualifiers
- research-article
Conference

Acceptance Rates
CIKM '14 Paper Acceptance Rate175of838submissions,21%Overall Acceptance Rate1,861of8,427submissions,22%
More
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 40
  Total Citations
  View Citations
- 431
  Total Downloads
- Downloads (Last 12 months)9
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A Fresh Look on Knowledge Bases: Distilling Named Events from News

CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Towards Vandalism Detection in Knowledge Bases: Corpus Construction and Analysis

Search-based entity disambiguation with document-centric knowledge bases

Integration of large scale knowledge bases using probabilistic graphical models