skip to main content
10.1145/3308558.3313711acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

Learning Semantic Models of Data Sources Using Probabilistic Graphical Models

Published:13 May 2019Publication History

ABSTRACT

A semantic model of a data source is a representation of the concepts and relationships contained in the data. Building semantic models is a prerequisite to automatically publishing data to a knowledge graph. However, creating these semantic models is a complex process requiring considerable manual effort and can be error-prone. In this paper, we present a novel approach that efficiently searches over the combinatorial space of possible semantic models, and applies a probabilistic graphical model to identify the most probable semantic model for a data source. Probabilistic graphical models offer many advantages over existing methods: they are robust to noisy inputs and provide a straightforward approach for exploiting relationships within the data. Our solution uses a conditional random field (CRF) to encode structural patterns and enforce conceptual consistency within the semantic model. In an empirical evaluation, our approach outperforms state of the art systems by an average 8.4% of F1 score, even with noisy input data.

References

  1. Bogdan Alexe, Balder Ten Cate, Phokion G Kolaitis, and Wang-Chiew Tan. 2011. Designing and refining schema mappings via data examples. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data. ACM, 133-144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. David Aumueller, Hong-Hai Do, Sabine Massmann, and Erhard Rahm. 2005. Schema and ontology matching with COMA++. In Proceedings of the 2005 ACM SIGMOD international conference on Management of data. Acm, 906-908. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Zohra Bellahsène, Angela Bonifati, and Erhard Rahm. 2011. Schema matching and mapping. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Philip A Bernstein, Jayant Madhavan, and Erhard Rahm. 2011. Generic schema matching, ten years later. Proceedings of the VLDB Endowment4, 11 (2011), 695-701.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Nick Craswell. 2009. Mean reciprocal rank. In Encyclopedia of Database Systems. Springer, 1703-1703.Google ScholarGoogle Scholar
  6. Robin Dhamankar, Yoonkyong Lee, AnHai Doan, Alon Halevy, and Pedro Domingos. 2004. iMAP: discovering complex semantic matches between database schemas. In Proceedings of the 2004 ACM SIGMOD international conference on Management of data. ACM, 383-394. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Anastasia Dimou, Miel Vander Sande, Jason Slepicka, Pedro Szekely, Erik Mannens, Craig A. Knoblock, and Rik Van de Walle. 2014. Mapping hierarchical sources into RDF using the RML mapping language. In IEEE International Conference on Semantic Computing (ICSC). IEEE, 151-158. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Ronald Fagin, Laura M Haas, Mauricio Hernández, Rene´e J Miller, Lucian Popa, and Yannis Velegrakis. 2009. Clio: Schema mapping creation and data exchange. In Conceptual modeling: foundations and applications. Springer, 198-236. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Ronald Fagin, Phokion G Kolaitis, Rene´e J Miller, and Lucian Popa. 2005. Data exchange: semantics and query answering. Theoretical Computer Science336, 1 (2005), 89-124. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Angelika Kimmig, Alexander Memory, Renee J Miller, and Lise Getoor. 2018. A Collective, Probabilistic Approach to Schema Mapping Using Diverse Noisy Evidence. IEEE Transactions on Knowledge and Data Engineering (2018).Google ScholarGoogle Scholar
  11. Craig A. Knoblock, Pedro Szekely, Jose Luis Ambite, Shubham Gupta, Aman Goel, Maria Muslea, Kristina Lerman, Mohsen Taheriyan, and Parag Mallick. 2012. Semi-Automatically Mapping Structured Sources into the Semantic Web. In Proceedings of the Extended Semantic Web Conference. Crete, Greece. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Craig A. Knoblock, Pedro Szekely, Eleanor Fink, David Newbury Duane Degler, Robert Sanderson, Kate Blanch, Sara Snyder, Nilay Chheda, Nimesh Jain, Ravi Raju Krishna, Nikhila Begur Sreekanth, and Yixiang Yao. 2017. Lessons Learned in Building Linked Data for the American Art Collaborative. In ISWC 2017 - 16th International Semantic Web Conference.Google ScholarGoogle Scholar
  13. Daphne Koller and Nir Friedman. 2009. Probabilistic Graphical Models: Principles and Techniques. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Girija Limaye, Sunita Sarawagi, and Soumen Chakrabarti. 2010. Annotating and Searching Web Tables Using Entities, Types and Relationships. Proc. VLDB Endow.3, 1-2 (Sept. 2010), 1338-1347. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Jayant Madhavan, Philip A Bernstein, and Erhard Rahm. 2001. Generic schema matching with cupid. In vldb, Vol. 1. 49-58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Bruno Marnette, Giansalvatore Mecca, Paolo Papotti, Salvatore Raunich, Donatello Santoro, 2011. ++ Spicy: an Open-Source Tool for Second-Generation Schema Mapping and Data Exchange. Clio19(2011), 21.Google ScholarGoogle Scholar
  17. Varish Mulwad, Tim Finin, and Anupam Joshi. 2013. Semantic message passing for generating linked data from tables. In International Semantic Web Conference. Springer, 363-378. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Minh Pham, Suresh Alse, Craig Knoblock, and Pedro Szekely. 2016. Semantic labeling: A domain-independent approach. In ISWC 2016 - 15th International Semantic Web Conference.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Erhard Rahm and Philip A Bernstein. 2001. A survey of approaches to automatic schema matching. the VLDB Journal10, 4 (2001), 334-350. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S.K. Ramnandan, Amol Mittal, Craig A. Knoblock, and Pedro Szekely. 2015. Assigning Semantic Labels to Data Sources. In Proceedings of the 12th ESWC 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Sashank J. Reddi, Satyen Kale, and Sanjiv Kumar. 2018. On the Convergence of Adam and Beyond. In ICLR 2018 : International Conference on Learning Representations 2018.Google ScholarGoogle Scholar
  22. Natalia Rümmele, Yuriy Tyshetskiy, and Alex Collins. 2018. Evaluating approaches for supervised semantic labeling.(2018).Google ScholarGoogle Scholar
  23. Satya S. Sahoo, Juan Sequeda, and Ahmed Ezzat. 2009. A Survey of Current Approaches for Mapping of Relational Databases to RDF. (2009).Google ScholarGoogle Scholar
  24. Hassan A Sleiman and Rafael Corchuelo. 2014. Trinity: on using trinary trees for unsupervised web data extraction. IEEE Transactions on Knowledge and Data Engineering26, 6(2014), 1544-1556. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Jason Slepicka, Chengye Yin, Pedro Szekely, and Craig A. Knoblock. 2015. KR2RML: An Alternative Interpretation of R2RML for Heterogenous Sources. In Proceedings of the 6th International Workshop on Consuming Linked Data (COLD 2015).Google ScholarGoogle Scholar
  26. Charles A. Sutton and Andrew McCallum. 2012. An Introduction to Conditional Random Fields. arXiv preprint arXiv:1011.40884, 4 (2012), 267-373. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Pedro Szekely, Craig A. Knoblock, Jason Slepicka, Andrew Philpot, Amandeep Singh, Chengye Yin, Dipsy Kapoor, Prem Natarajan, Daniel Marcu, Kevin Knight, David Stallard, Subessware S. Karunamoorthy, Rajagopal Bojanapalli, Steven Minton, Brian Amanatullah, Todd Hughes, Mike Tamayo, David Flynt, Rachel Artiss, Shih-Fu Chang, Tao Chen, Gerald Hiebel, and Lidia Ferreira. 2015. Building and Using a Knowledge Graph to Combat Human Trafficking. In Proceedings of the 14th International Semantic Web Conference (ISWC 2015).Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Mohsen Taheriyan, Craig Knoblock, Pedro Szekely, and Jose Luis Ambite. 2016. Leveraging Linked Data to Discover Semantic Relations within Data Sources. In ISWC 2016 - 15th International Semantic Web Conference.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Mohsen Taheriyan, Craig A. Knoblock, Pedro Szekely, and Jose Luis Ambite. 2016. Learning the semantics of structured data sources. Journal of Web Semantics(2016). Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Diego De U&ntiled;a, Nataliia Rümmele, Graeme Gange, Peter Schachte, Peter J. Stuckey, and Peter J. Stuckey. 2018. Machine Learning and Constraint Programming for Relational-To-Ontology Schema Mapping. In IJCAI 2018: 27th International Joint Conference on Artificial Intelligence. 1277-1283. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Pasca, Warren Shen, Fei Wu, Gengxin Miao, and Chung Wu. 2011. Recovering semantics of tables on the web. Proceedings of the VLDB Endowment4, 9 (2011), 528-538. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    WWW '19: The World Wide Web Conference
    May 2019
    3620 pages
    ISBN:9781450366748
    DOI:10.1145/3308558

    Copyright © 2019 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 13 May 2019

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate1,899of8,196submissions,23%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format