ABSTRACT
Many approaches have been introduced recently to automatically create or augment Knowledge Graphs (KGs) with facts extracted from Wikipedia, particularly its structured components like the infoboxes. Although these structures are valuable, they represent only a fraction of the actual information expressed in the articles. In this work, we quantify the number of highly accurate facts that can be harvested with high precision from the text of Wikipedia articles using information extraction techniques bootstrapped from the entities and relations already in a KG. Our experimental evaluation, which uses Freebase as reference KG, reveals we can augment several relations in the domain of people by more than 10%, with facts whose accuracy are over 95%. Moreover, the vast majority of these facts are missing from the infoboxes, YAGO and DBpedia.
- E. Agichtein, L. Gravano. Snowball: Extracting relations from large plain-text collections. ACM DL 2000. Google ScholarDigital Library
- C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak, and S. Hellmann. Dbpedia - a crystallization point for the web of data. Web Semant., 7(3):154--165, Sept. 2009. Google ScholarDigital Library
- S. Brin. Extracting patterns and relations from the world wide web. WebDB, 1998. Google ScholarDigital Library
- A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. R. H. Jr., T. M. Mitchell. Toward an architecture for never-ending language learning. AAAI, 2010. Google ScholarDigital Library
- F. de Sá Mesquita, J. Schmidek, and D. Barbosa. Effectiveness and efficiency of open relation extraction. EMNLP, 2013.Google Scholar
- X. Dong, E. Gabrilovich, G. Heitz, W. Horn, N. Lao, K. Murphy, T. Strohmann, S. Sun, and W. Zhang. Knowledge vault: A web-scale approach to probabilistic knowledge fusion. KDD, 2014. Google ScholarDigital Library
- M. Banko, M. J. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni. Open information extraction for the web. IJCAI, 2007. Google ScholarDigital Library
- M. A. Hearst. Automatic acquisition of hyponyms from large text corpora. COLING, 1992. Google ScholarDigital Library
- E. Hovy, R. Navigli, S. P. Ponzetto. Collaboratively built semi-structured content and artificial intelligence: The story so far. Artif. Intell., 194, Jan. 2013. Google ScholarDigital Library
- B. Min, R. Grishman, L. Wan, C. Wang, D. Gondek. Distant supervision for relation extraction with an incomplete knowledge base. NAACL, 2013.Google Scholar
- M. Mintz, S. Bills, R. Snow, and D. Jurafsky. Distant supervision for relation extraction without labeled data. ACL, 2009. Google ScholarDigital Library
- N. Nakashole, M. Theobald, and G. Weikum. Scalable knowledge harvesting with high precision and high recall. WSDM, 2011. Google ScholarDigital Library
- H. Paulheim. Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic Web, In Press, 2016.Google Scholar
- M. Ruiz-Casado, E. Alfonseca, and P. Castells. Automatising the learning of lexical patterns: An application to the enrichment of wordnet by extracting semantic relationships from wikipedia. Data Knowl. Eng., 61(3):484--499, June 2007. Google ScholarDigital Library
- F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: A large ontology from wikipedia and wordnet. Web Semant., 6(3):203--217, Sept. 2008. Google ScholarDigital Library
- F. M. Suchanek, M. Sozio, and G. Weikum. Sofie: A self-organizing framework for information extraction. WWW, 2009. Google ScholarDigital Library
- R. West, E. Gabrilovich, K. Murphy, S. Sun, R. Gupta, and D. Lin. Knowledge base completion via search-based question answering. WWW, 2014. Google ScholarDigital Library
- F. Wu and D. S. Weld. Autonomously semantifying wikipedia. CIKM, 2007. Google ScholarDigital Library
- F. Wu and D. S. Weld. Open information extraction using wikipedia. ACL, 2010. Google ScholarDigital Library
- T. P. Tanon, D. Vrandečić, S. Schaffert, T. Steiner, and L. Pintscher. From freebase to wikidata: The great migration. WWW, 2016. Google ScholarDigital Library
Recommendations
Using Wikipedia for cross-language named entity recognition
MSM/MUSE/SenseML'14: Proceedings of the 5th and 1st International Conference on Big Data Analytics in the Social and Ubiquitous Context - 5th International Workshop on Modeling Social Media, 5th International Workshop on Mining Ubiquitous and Social Environments and First International Workshop on Machine Learning for Urban Sensor DataNamed entity recognition and classification (NERC) is fundamental for natural language processing tasks such as information extraction, question answering, and topic detection. State-of-the-art NERC systems are based on supervised machine learning and ...
Analysing anaphoric ambiguity in natural language requirements
Special Issue on Best Papers of RE'10: Requirements Engineering in a Multi-faceted WorldMany requirements documents are written in natural language (NL). However, with the flexibility of NL comes the risk of introducing unwanted ambiguities in the requirements and misunderstandings between stakeholders. In this paper, we describe an ...
Named entity recognition in Wikipedia
People's Web '09: Proceedings of the 2009 Workshop on The People's Web Meets NLP: Collaboratively Constructed Semantic ResourcesNamed entity recognition (NER) is used in many domains beyond the newswire text that comprises current gold-standard corpora. Recent work has used Wikipedia's link structure to automatically generate near gold-standard annotations. Until now, these ...
Comments