skip to main content
10.1145/1772690.1772814acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
poster

Web-scale knowledge extraction from semi-structured tables

Published: 26 April 2010 Publication History

Abstract

A wealth of knowledge is encoded in the form of tables on the World Wide Web. We propose a classification algorithm and a rich feature set for automatically recognizing layout tables and attribute/value tables. We report the frequencies of these table types over a large analysis of the Web and propose open challenges for extracting from attribute/value tables semantic triples (knowledge). We then describe a solution to a key problem in extracting semantic triples: protagonist detection, i.e., finding the subject of the table that often is not present in the table itself. In 79% of our Web tables, our method finds the correct protagonist in its top three returned candidates.

References

[1]
Cafarella, M. J.; Halevy, A.; Wang, D. Z.; Wu, E.; and Zhang, Y. 2008. WebTables: Exploring the Powerpower of Tablestables on the Web. In Proceedings of VLDB-08. Auckland, New Zealand. pp. 538--549.
[2]
Chen, H.; Tsai, S.; and Tsai, J. 2000. Mining Tables from Large-Scale HTML Texts. In Proceedings of COLING-00. Saarbrücken, Germany.
[3]
J. H.; Jiang, D.; Pei, J.; He, Q.; Liao, Z.; Chen, E.; and Li, H. 2008. Context-aware query suggestion by mining click-through and session data. In Proceedings of KDD-08. pp. 875--883.
[4]
Friedman, J. H. 2001. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5):1189--1232.
[5]
Gatterbauer, W.; Bohunsky, P.; Herzog, M.; Krupl, B.; and Pollak, B. 2007. Towards Domain-Independent Information Extraction from Web Tables. In Proceedings WWW-07. pp. 71--80. Banff, Canada.
[6]
Wang, Y. and Hu, J. 2002. A Machine Learning Based Approach for Table Detection on the Web. In Proceedings of WWW-02. Honolulu, Hawaii.

Cited By

View all
  • (2022)Extraction of Product Specifications from the Web - Going Beyond Tables and ListsProceedings of the 5th Joint International Conference on Data Science & Management of Data (9th ACM IKDD CODS and 27th COMAD)10.1145/3493700.3493713(19-27)Online publication date: 8-Jan-2022
  • (2020)Open Information Extraction as Additional Source for Kazakh Ontology GenerationIntelligent Information and Database Systems10.1007/978-3-030-41964-6_8(86-96)Online publication date: 4-Mar-2020
  • (2019)A Survey on Knowledge Extraction Techniques for Web Tables2019 5th International Conference on Web Research (ICWR)10.1109/ICWR.2019.8765271(123-127)Online publication date: Apr-2019
  • Show More Cited By

Index Terms

  1. Web-scale knowledge extraction from semi-structured tables

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    WWW '10: Proceedings of the 19th international conference on World wide web
    April 2010
    1407 pages
    ISBN:9781605587998
    DOI:10.1145/1772690

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 April 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. classification
    2. information extraction
    3. structured data
    4. web tables

    Qualifiers

    • Poster

    Conference

    WWW '10
    WWW '10: The 19th International World Wide Web Conference
    April 26 - 30, 2010
    North Carolina, Raleigh, USA

    Acceptance Rates

    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 18 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Extraction of Product Specifications from the Web - Going Beyond Tables and ListsProceedings of the 5th Joint International Conference on Data Science & Management of Data (9th ACM IKDD CODS and 27th COMAD)10.1145/3493700.3493713(19-27)Online publication date: 8-Jan-2022
    • (2020)Open Information Extraction as Additional Source for Kazakh Ontology GenerationIntelligent Information and Database Systems10.1007/978-3-030-41964-6_8(86-96)Online publication date: 4-Mar-2020
    • (2019)A Survey on Knowledge Extraction Techniques for Web Tables2019 5th International Conference on Web Research (ICWR)10.1109/ICWR.2019.8765271(123-127)Online publication date: Apr-2019
    • (2019)A framework for information extraction from tables in biomedical literatureInternational Journal on Document Analysis and Recognition10.1007/s10032-019-00317-022:1(55-78)Online publication date: 1-Mar-2019
    • (2016)The Logical-Linguistic Model of Fact Extraction from English TextsInformation and Software Technologies10.1007/978-3-319-46254-7_51(625-635)Online publication date: 22-Sep-2016
    • (2014)Extracting Attributes and Synonymous Attributes from Online EncyclopediasProceedings of the 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) - Volume 0110.1109/WI-IAT.2014.46(290-296)Online publication date: 11-Aug-2014
    • (2013)Semantic extraction of geographic data from web tables for big data integrationProceedings of the 7th Workshop on Geographic Information Retrieval10.1145/2533888.2533939(19-26)Online publication date: 5-Nov-2013
    • (2013)A web table extraction algorithm based on tree edit distanceIEEE Conference Anthology10.1109/ANTHOLOGY.2013.6784738(1-6)Online publication date: Jan-2013
    • (2012)Mining special features to improve the performance of e-commerce product selection and resume processingInternational Journal of Computational Science and Engineering10.1504/IJCSE.2012.0461837:1(82-95)Online publication date: 1-Mar-2012
    • (2011)OSD-DBProceedings of the 13th Asia-Pacific web conference on Web technologies and applications10.5555/1996794.1996856(440-449)Online publication date: 18-Apr-2011
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    EPUB

    View this article in ePub.

    ePub

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media