skip to main content
10.1145/1772690.1772846acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
poster

Entity relation discovery from web tables and links

Authors Info & Claims
Published:26 April 2010Publication History

ABSTRACT

The World-Wide Web consists not only of a huge number of unstructured texts, but also a vast amount of valuable structured data. Web tables [2] are a typical type of structured information that are pervasive on the web, and Web-scale methods that automatically extract web tables have been studied extensively [1]. Many powerful systems (e.g.OCTOPUS [4], Mesa [3]) use extracted web tables as a fundamental component.

In the database vernacular, a table is defined as a set of tuples which have the same attributes. Similarly, a web table is defined as a set of rows (corresponding to database tuples) which have the same column headers (corresponding to database attributes). Therefore, to extract a web table is to extract a relation on the web. In databases, tables often contain foreign keys which refer to other tables. Therefore, it follows that hyperlinks inside a web table sometimes function as foreign keys to other relations whose tuples are contained in the hyperlink's target pages. In this paper, we explore this idea by asking: can we discover new attributes for web tables by exploring hyperlinks inside web tables?

This poster proposes a solution that takes a web table as input. Frequent patterns are generated as new candidate relations by following hyperlinks in the web table. The confidence of candidates are evaluated, and trustworthy candidates are selected to become new attributes for the table. Finally, we show the usefulness of our method by performing experiments on a variety of web domains.

References

  1. G Miao, J. Tatemura, W.-P Hsiung, A. Sawires and L. E. Moser, Extracting data records from the web using tag path clustering In WWW, p981--990, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. J. Cafarella, A. Y. Halevy, D. Z. Wang, E. Wu and Y. Zhang, WebTables: exploring the power of tables on the web, In VLDB, p.538--549, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Mergen, J. Freire and C. Heuser Mesa: A Search Engine for Querying Web Tables, In SBBD, demo, 2008.Google ScholarGoogle Scholar
  4. M. J. Cafarella, A. Y. Halevy and N. Khoussainova, Data Integration for the Relational Web, VLDB, p.1090--1101, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Han and J. Pei, Mining Frequent Patterns by Pattern-Growth: Methodology and Implications, In SIGKDD Exploration, p.13--20, 2000 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Yates, M. Banko, M. Broadhead, M. J. Cafarella, O. Etzioni and S. Soderland, TextRunner: Open Information Extraction on the Web, In HLT-NAACL, p.25--26, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Culotta, A. McCallum and J. Betz, Integrating Probabilistic Extraction Models and Data Mining to Discover Relations and Patterns in Text, In HLT-NAACL, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Entity relation discovery from web tables and links

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      WWW '10: Proceedings of the 19th international conference on World wide web
      April 2010
      1407 pages
      ISBN:9781605587998
      DOI:10.1145/1772690

      Copyright © 2010 Copyright is held by the author/owner(s)

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 26 April 2010

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • poster

      Acceptance Rates

      Overall Acceptance Rate1,899of8,196submissions,23%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    ePub

    View this article in ePub.

    View ePub