skip to main content
10.1145/1871437.1871698acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
poster

Extracting structured information from Wikipedia articles to populate infoboxes

Published:26 October 2010Publication History

ABSTRACT

Roughly every third Wikipedia article contains an infobox - a table that displays important facts about the subject in attribute-value form. The schema of an infobox, i.e., the attributes that can be expressed for a concept, is defined by an infobox template. Often, authors do not specify all template attributes, resulting in incomplete infoboxes.

With iPopulator, we introduce a system that automatically populates infoboxes of Wikipedia articles by extracting attribute values from the article's text. In contrast to prior work, iPopulator detects and exploits the structure of attribute values to independently extract value parts. We have tested iPopulator on the entire set of infobox templates and provide a detailed analysis of its effectiveness. For instance, we achieve an average extraction precision of 91% for 1,727 distinct infobox template attributes.

References

  1. J. D. Lafferty, A. McCallum, and F. C. N. Pereira. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In Proc. of the 18th Intl. Conf. on Machine Learning, pages 282--289, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. Lange, C. Böhm, and F. Naumann. Extracting Structured Information from Wikipedia Articles to Populate Infoboxes. Technical Report 38, Hasso Plattner Institute, Potsdam, 2010. ISBN 978-3-86956-081-6.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. N. Okazaki. CRFsuite: a fast implementation of Conditional Random Fields (CRFs), 2007. http://www.chokkan.org/software/crfsuite/.Google ScholarGoogle Scholar
  4. S. Sarawagi. Information Extraction. Foundations and Trends in Databases, 1(3), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. F. Wu, R. Hoffmann, and D. S. Weld. Information Extraction from Wikipedia: Moving Down the Long tail. In Proc. of the 14th Intl. Conf. on Knowledge Discovery and Data Mining, pages 731--739, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. F. Wu and D. S. Weld. Autonomously Semantifying Wikipedia. In Proc. of the 16th Conf. on Information and Knowledge Management, pages 41--50, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Extracting structured information from Wikipedia articles to populate infoboxes

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management
      October 2010
      2036 pages
      ISBN:9781450300995
      DOI:10.1145/1871437

      Copyright © 2010 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 26 October 2010

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • poster

      Acceptance Rates

      Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader