skip to main content
10.1145/2932194.2932197acmotherconferencesArticle/Chapter ViewAbstractPublication PageswebdbConference Proceedingsconference-collections
research-article

Fusing time-dependent web table data

Published: 26 June 2016 Publication History

Abstract

A subset of the HTML tables on the Web contains relational data. The data in these tables covers a multitude of topics and is thus very useful for complementing or validating cross-domain knowledge bases, such as DBpedia, YAGO, or the Google Knowledge Graph. A large fraction of the data in these knowledge bases is time-dependent, meaning that the correctness of an attribute value depends on a point in time. Fusing data from web tables in order to determine correct values for time-dependent attributes is challenging as most web tables do not contain timestamp information. A possibility to deal with this sparsity is to exploit timestamps which appear in different locations on the web page around the table. But as these timestamps might not apply to the web table value in question, this approach introduces noise. This paper investigates the extent to which the performance of data fusion strategies that rely on voting, PageRank, and Knowledge-Based-Trust can be improved by incorporating noisy and sparse timestamp information. For this, we present a machine-learning-based approach which considers different types of noisy timestamps in the data fusion process, and experiment with propagating timestamp information between web tables in order to overcome sparsity. We evaluate the data fusion strategies using a large public corpus of web tables and a public gold standard of time-dependent attribute values. We find that our methods effectively choose and weigh timestamp information per attribute and reduce sparsity using propagation. By incorporating timestamp information into data fusion strategies that previously did not exploit temporal meta information, we are able to increase F1-measure on average by 5%.

References

[1]
J. Bleiholder and F. Naumann. Data fusion. ACM Computing Surveys, 41(1):1:1--1:41, Jan. 2009.
[2]
S. Brin and L. Page. Reprint of: The anatomy of a large-scale hypertextual web search engine. Computer networks, 56(18):3825--3833, 2012.
[3]
M. J. Cafarella, A. Y. Halevy, Y. Zhang, D. Z. Wang, and E. Wu. Uncovering the relational web. In Proceedings of the 11th International Workshop on the Web and Databases, 2008.
[4]
X. L. Dong, L. Berti-Equille, and D. Srivastava. Truth discovery and copying detection in a dynamic world. Proceedings of the VLDB Endowment, 2(1):562--573, Aug. 2009.
[5]
X. L. Dong, E. Gabrilovich, G. Heitz, W. Horn, K. Murphy, S. Sun, and W. Zhang. From data fusion to knowledge fusion. Proceedings of the VLDB Endowment, 2014.
[6]
X. L. Dong, E. Gabrilovich, K. Murphy, V. Dang, W. Horn, C. Lugaresi, S. Sun, and W. Zhang. Knowledge-based trust: Estimating the trustworthiness of web sources. Proceedings of the VLDB Endowment, 2015.
[7]
O. Lehmberg, D. Ritze, R. Meusel, and C. Bizer. A large public corpus of web tables containing time and context metadata. In Proceedings of the 25th International Conference Companion on World Wide Web, pages 75--76, 2016.
[8]
Y. Li, J. Gao, and C. Meng. A survey on truth discovery. SIGKDD Explorer Newsletter, 17(2):1--16, Feb. 2016.
[9]
R. Meusel, S. Vigna, O. Lehmberg, and C. Bizer. The graph structure in the web--analyzed on different aggregation levels. The Journal of Web Science, 1(1), 2015.
[10]
J. Pasternack and D. Roth. Knowing what to believe (when you already know something). In Proceedings of the 23rd International Conference on Computational Linguistics, pages 877--885. Association for Computational Linguistics, 2010.
[11]
H. Paulheim. Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic Web, 2016. (to appear).
[12]
D. Ritze, O. Lehmberg, and C. Bizer. Matching html tables to dbpedia. In Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics (WIMS). ACM, 2015.
[13]
D. Ritze, O. Lehmberg, Y. Oulabi, and C. Bizer. Profiling the potential of web tables for augmenting cross-domain knowledge bases. In Proceedings of the 25th International Conference on World Wide Web, pages 251--261, Republic and Canton of Geneva, Switzerland, 2016.
[14]
J. Strötgen and M. Gertz. A baseline temporal tagger for all languages. In Proceedings of the Conference on Empirical Methods on Natural Language Processing (EMNLP), 2015.
[15]
M. Yakout, K. Ganjam, K. Chakrabarti, and S. Chaudhuri. Infogather: Entity augmentation and attribute discovery by holistic matching with web tables. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, 2012.
[16]
X. Yin, J. Han, and P. S. Yu. Truth discovery with multiple conflicting information providers on the web. IEEE Transactions on Knowledge and Data Engineering (TKDE), 2008.
[17]
X. Yin and W. Tan. Semi-supervised truth discovery. In Proceedings of the 21th International Conference on World Wide Web, 2011.
[18]
M. Zhang and K. Chakrabarti. Infogather+: Semantic matching and annotation of numeric and time-varying attributes in web tables. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pages 145--156, 2013.

Cited By

View all
  • (2019)Synthesizing N-ary Relations from Web TablesProceedings of the 9th International Conference on Web Intelligence, Mining and Semantics10.1145/3326467.3326480(1-12)Online publication date: 26-Jun-2019
  • (2019)Profiling the semantics of n-ary web table dataProceedings of the International Workshop on Semantic Big Data10.1145/3323878.3325806(1-6)Online publication date: 5-Jul-2019
  • (2017)Retrieval, Crawling and Fusion of Entity-centric Data on the WebSemantic Keyword-Based Search on Structured Data Sources10.1007/978-3-319-53640-8_1(3-16)Online publication date: 15-Feb-2017

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
WebDB '16: Proceedings of the 19th International Workshop on Web and Databases
June 2016
59 pages
ISBN:9781450343107
DOI:10.1145/2932194
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 June 2016

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

SIGMOD/PODS'16
SIGMOD/PODS'16: International Conference on Management of Data
June 26 - July 1, 2016
California, San Francisco

Acceptance Rates

WebDB '16 Paper Acceptance Rate 9 of 29 submissions, 31%;
Overall Acceptance Rate 30 of 100 submissions, 30%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)2
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2019)Synthesizing N-ary Relations from Web TablesProceedings of the 9th International Conference on Web Intelligence, Mining and Semantics10.1145/3326467.3326480(1-12)Online publication date: 26-Jun-2019
  • (2019)Profiling the semantics of n-ary web table dataProceedings of the International Workshop on Semantic Big Data10.1145/3323878.3325806(1-6)Online publication date: 5-Jul-2019
  • (2017)Retrieval, Crawling and Fusion of Entity-centric Data on the WebSemantic Keyword-Based Search on Structured Data Sources10.1007/978-3-319-53640-8_1(3-16)Online publication date: 15-Feb-2017

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media