poster

Entity relation discovery from web tables and links

Authors:
Cindy Xide Lin

University of Illinois at Urbana-Champaign, Urbana, IL, USA

University of Illinois at Urbana-Champaign, Urbana, IL, USA
View Profile

,
Bo Zhao

University of Illinois at Urbana-Champaign, Urbana, IL, USA

University of Illinois at Urbana-Champaign, Urbana, IL, USA
View Profile

,
Tim Weninger

University of Illinois at Urbana-Champaign, Urbana, IL, USA

University of Illinois at Urbana-Champaign, Urbana, IL, USA
View Profile

,
Jiawei Han

University of Illinois at Urbana-Champaign, Urbana, IL, USA

University of Illinois at Urbana-Champaign, Urbana, IL, USA
View Profile

,
Bing Liu

University of Illinois at Chicago, Chicago, IL, USA

University of Illinois at Chicago, Chicago, IL, USA
View Profile

WWW '10: Proceedings of the 19th international conference on World wide webApril 2010Pages 1145–1146https://doi.org/10.1145/1772690.1772846

Published:26 April 2010Publication History

WWW '10: Proceedings of the 19th international conference on World wide web

Pages 1145–1146

ABSTRACT

The World-Wide Web consists not only of a huge number of unstructured texts, but also a vast amount of valuable structured data. Web tables [2] are a typical type of structured information that are pervasive on the web, and Web-scale methods that automatically extract web tables have been studied extensively [1]. Many powerful systems (e.g.OCTOPUS [4], Mesa [3]) use extracted web tables as a fundamental component.

In the database vernacular, a table is defined as a set of tuples which have the same attributes. Similarly, a web table is defined as a set of rows (corresponding to database tuples) which have the same column headers (corresponding to database attributes). Therefore, to extract a web table is to extract a relation on the web. In databases, tables often contain foreign keys which refer to other tables. Therefore, it follows that hyperlinks inside a web table sometimes function as foreign keys to other relations whose tuples are contained in the hyperlink's target pages. In this paper, we explore this idea by asking: can we discover new attributes for web tables by exploring hyperlinks inside web tables?

This poster proposes a solution that takes a web table as input. Frequent patterns are generated as new candidate relations by following hyperlinks in the web table. The confidence of candidates are evaluated, and trustworthy candidates are selected to become new attributes for the table. Finally, we show the usefulness of our method by performing experiments on a variety of web domains.

References

G Miao, J. Tatemura, W.-P Hsiung, A. Sawires and L. E. Moser, Extracting data records from the web using tag path clustering In WWW, p981--990, 2009. Google ScholarDigital Library
M. J. Cafarella, A. Y. Halevy, D. Z. Wang, E. Wu and Y. Zhang, WebTables: exploring the power of tables on the web, In VLDB, p.538--549, 2008. Google ScholarDigital Library
S. Mergen, J. Freire and C. Heuser Mesa: A Search Engine for Querying Web Tables, In SBBD, demo, 2008.Google Scholar
M. J. Cafarella, A. Y. Halevy and N. Khoussainova, Data Integration for the Relational Web, VLDB, p.1090--1101, 2009. Google ScholarDigital Library
J. Han and J. Pei, Mining Frequent Patterns by Pattern-Growth: Methodology and Implications, In SIGKDD Exploration, p.13--20, 2000 Google ScholarDigital Library
A. Yates, M. Banko, M. Broadhead, M. J. Cafarella, O. Etzioni and S. Soderland, TextRunner: Open Information Extraction on the Web, In HLT-NAACL, p.25--26, 2007. Google ScholarDigital Library
A. Culotta, A. McCallum and J. Betz, Integrating Probabilistic Extraction Models and Data Mining to Discover Relations and Patterns in Text, In HLT-NAACL, 2006. Google ScholarDigital Library

Index Terms

Entity relation discovery from web tables and links
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Constraint-driven join processing in a web warehouse

There has been considerable research in join operation in relational databases. In this paper, we introduce the concept of web join for combining hyperlinked Web data. Web join is one of the web algebraic operator in our web warehousing system called ...
Read More
Web personal name disambiguation based on reference entity tables mined from the web
WIDM '09: Proceedings of the eleventh international workshop on Web information and data management

Ambiguous personal names are common on the Web, which pose a challenge for many different tasks. The traditional disambiguation employs the clustering methods. However, without reference entity tables, the clustering method can only identify whether two ...
Read More
Enhancing browsing experience of table and image elements in web pages
ICMI-MLMI '10: International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction

As the popularity and diversification of both Internet and its access devices, users' browsing experience of web pages is in great need of improvement. Traditional browsing mode of web elements such as table and image is passive, which limits users' ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '10: Proceedings of the 19th international conference on World wide web
April 2010
1407 pages
ISBN:9781605587998
DOI:10.1145/1772690
General Chairs:
Michael Rappa
North Carolina State University, USA
,
Paul Jones
University of North Carolina at Chapel Hill, USA
,
Program Chairs:
Juliana Freire
University of Utah, USA
,
Soumen Chakrabarti
Indian Institute of Technology, India
Copyright © 2010 Copyright is held by the author/owner(s)
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 April 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
entity relation discovery
link
web table
Qualifiers
- poster
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 422
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

ePub

View this article in ePub.

View ePub

Entity relation discovery from web tables and links

WWW '10: Proceedings of the 19th international conference on World wide web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Constraint-driven join processing in a web warehouse

Web personal name disambiguation based on reference entity tables mined from the web

Enhancing browsing experience of table and image elements in web pages