skip to main content
10.1145/2932194.2932203acmotherconferencesArticle/Chapter ViewAbstractPublication PageswebdbConference Proceedingsconference-collections
research-article

Accurate fact harvesting from natural language text in wikipedia with Lector

Published:26 June 2016Publication History

ABSTRACT

Many approaches have been introduced recently to automatically create or augment Knowledge Graphs (KGs) with facts extracted from Wikipedia, particularly its structured components like the infoboxes. Although these structures are valuable, they represent only a fraction of the actual information expressed in the articles. In this work, we quantify the number of highly accurate facts that can be harvested with high precision from the text of Wikipedia articles using information extraction techniques bootstrapped from the entities and relations already in a KG. Our experimental evaluation, which uses Freebase as reference KG, reveals we can augment several relations in the domain of people by more than 10%, with facts whose accuracy are over 95%. Moreover, the vast majority of these facts are missing from the infoboxes, YAGO and DBpedia.

References

  1. E. Agichtein, L. Gravano. Snowball: Extracting relations from large plain-text collections. ACM DL 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak, and S. Hellmann. Dbpedia - a crystallization point for the web of data. Web Semant., 7(3):154--165, Sept. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Brin. Extracting patterns and relations from the world wide web. WebDB, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. R. H. Jr., T. M. Mitchell. Toward an architecture for never-ending language learning. AAAI, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. F. de Sá Mesquita, J. Schmidek, and D. Barbosa. Effectiveness and efficiency of open relation extraction. EMNLP, 2013.Google ScholarGoogle Scholar
  6. X. Dong, E. Gabrilovich, G. Heitz, W. Horn, N. Lao, K. Murphy, T. Strohmann, S. Sun, and W. Zhang. Knowledge vault: A web-scale approach to probabilistic knowledge fusion. KDD, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Banko, M. J. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni. Open information extraction for the web. IJCAI, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. A. Hearst. Automatic acquisition of hyponyms from large text corpora. COLING, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. E. Hovy, R. Navigli, S. P. Ponzetto. Collaboratively built semi-structured content and artificial intelligence: The story so far. Artif. Intell., 194, Jan. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. B. Min, R. Grishman, L. Wan, C. Wang, D. Gondek. Distant supervision for relation extraction with an incomplete knowledge base. NAACL, 2013.Google ScholarGoogle Scholar
  11. M. Mintz, S. Bills, R. Snow, and D. Jurafsky. Distant supervision for relation extraction without labeled data. ACL, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. N. Nakashole, M. Theobald, and G. Weikum. Scalable knowledge harvesting with high precision and high recall. WSDM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. H. Paulheim. Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic Web, In Press, 2016.Google ScholarGoogle Scholar
  14. M. Ruiz-Casado, E. Alfonseca, and P. Castells. Automatising the learning of lexical patterns: An application to the enrichment of wordnet by extracting semantic relationships from wikipedia. Data Knowl. Eng., 61(3):484--499, June 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. F. M. Suchanek, G. Kasneci, and G. Weikum. Yago: A large ontology from wikipedia and wordnet. Web Semant., 6(3):203--217, Sept. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. F. M. Suchanek, M. Sozio, and G. Weikum. Sofie: A self-organizing framework for information extraction. WWW, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. West, E. Gabrilovich, K. Murphy, S. Sun, R. Gupta, and D. Lin. Knowledge base completion via search-based question answering. WWW, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. F. Wu and D. S. Weld. Autonomously semantifying wikipedia. CIKM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. F. Wu and D. S. Weld. Open information extraction using wikipedia. ACL, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. T. P. Tanon, D. Vrandečić, S. Schaffert, T. Steiner, and L. Pintscher. From freebase to wikidata: The great migration. WWW, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    WebDB '16: Proceedings of the 19th International Workshop on Web and Databases
    June 2016
    59 pages
    ISBN:9781450343107
    DOI:10.1145/2932194

    Copyright © 2016 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 26 June 2016

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    WebDB '16 Paper Acceptance Rate9of29submissions,31%Overall Acceptance Rate30of100submissions,30%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader