ABSTRACT
A semantic model of a data source is a representation of the concepts and relationships contained in the data. Building semantic models is a prerequisite to automatically publishing data to a knowledge graph. However, creating these semantic models is a complex process requiring considerable manual effort and can be error-prone. In this paper, we present a novel approach that efficiently searches over the combinatorial space of possible semantic models, and applies a probabilistic graphical model to identify the most probable semantic model for a data source. Probabilistic graphical models offer many advantages over existing methods: they are robust to noisy inputs and provide a straightforward approach for exploiting relationships within the data. Our solution uses a conditional random field (CRF) to encode structural patterns and enforce conceptual consistency within the semantic model. In an empirical evaluation, our approach outperforms state of the art systems by an average 8.4% of F1 score, even with noisy input data.
- Bogdan Alexe, Balder Ten Cate, Phokion G Kolaitis, and Wang-Chiew Tan. 2011. Designing and refining schema mappings via data examples. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data. ACM, 133-144. Google ScholarDigital Library
- David Aumueller, Hong-Hai Do, Sabine Massmann, and Erhard Rahm. 2005. Schema and ontology matching with COMA++. In Proceedings of the 2005 ACM SIGMOD international conference on Management of data. Acm, 906-908. Google ScholarDigital Library
- Zohra Bellahsène, Angela Bonifati, and Erhard Rahm. 2011. Schema matching and mapping. Springer. Google ScholarDigital Library
- Philip A Bernstein, Jayant Madhavan, and Erhard Rahm. 2011. Generic schema matching, ten years later. Proceedings of the VLDB Endowment4, 11 (2011), 695-701.Google ScholarDigital Library
- Nick Craswell. 2009. Mean reciprocal rank. In Encyclopedia of Database Systems. Springer, 1703-1703.Google Scholar
- Robin Dhamankar, Yoonkyong Lee, AnHai Doan, Alon Halevy, and Pedro Domingos. 2004. iMAP: discovering complex semantic matches between database schemas. In Proceedings of the 2004 ACM SIGMOD international conference on Management of data. ACM, 383-394. Google ScholarDigital Library
- Anastasia Dimou, Miel Vander Sande, Jason Slepicka, Pedro Szekely, Erik Mannens, Craig A. Knoblock, and Rik Van de Walle. 2014. Mapping hierarchical sources into RDF using the RML mapping language. In IEEE International Conference on Semantic Computing (ICSC). IEEE, 151-158. Google ScholarDigital Library
- Ronald Fagin, Laura M Haas, Mauricio Hernández, Rene´e J Miller, Lucian Popa, and Yannis Velegrakis. 2009. Clio: Schema mapping creation and data exchange. In Conceptual modeling: foundations and applications. Springer, 198-236. Google ScholarDigital Library
- Ronald Fagin, Phokion G Kolaitis, Rene´e J Miller, and Lucian Popa. 2005. Data exchange: semantics and query answering. Theoretical Computer Science336, 1 (2005), 89-124. Google ScholarDigital Library
- Angelika Kimmig, Alexander Memory, Renee J Miller, and Lise Getoor. 2018. A Collective, Probabilistic Approach to Schema Mapping Using Diverse Noisy Evidence. IEEE Transactions on Knowledge and Data Engineering (2018).Google Scholar
- Craig A. Knoblock, Pedro Szekely, Jose Luis Ambite, Shubham Gupta, Aman Goel, Maria Muslea, Kristina Lerman, Mohsen Taheriyan, and Parag Mallick. 2012. Semi-Automatically Mapping Structured Sources into the Semantic Web. In Proceedings of the Extended Semantic Web Conference. Crete, Greece. Google ScholarDigital Library
- Craig A. Knoblock, Pedro Szekely, Eleanor Fink, David Newbury Duane Degler, Robert Sanderson, Kate Blanch, Sara Snyder, Nilay Chheda, Nimesh Jain, Ravi Raju Krishna, Nikhila Begur Sreekanth, and Yixiang Yao. 2017. Lessons Learned in Building Linked Data for the American Art Collaborative. In ISWC 2017 - 16th International Semantic Web Conference.Google Scholar
- Daphne Koller and Nir Friedman. 2009. Probabilistic Graphical Models: Principles and Techniques. Google ScholarDigital Library
- Girija Limaye, Sunita Sarawagi, and Soumen Chakrabarti. 2010. Annotating and Searching Web Tables Using Entities, Types and Relationships. Proc. VLDB Endow.3, 1-2 (Sept. 2010), 1338-1347. Google ScholarDigital Library
- Jayant Madhavan, Philip A Bernstein, and Erhard Rahm. 2001. Generic schema matching with cupid. In vldb, Vol. 1. 49-58. Google ScholarDigital Library
- Bruno Marnette, Giansalvatore Mecca, Paolo Papotti, Salvatore Raunich, Donatello Santoro, 2011. ++ Spicy: an Open-Source Tool for Second-Generation Schema Mapping and Data Exchange. Clio19(2011), 21.Google Scholar
- Varish Mulwad, Tim Finin, and Anupam Joshi. 2013. Semantic message passing for generating linked data from tables. In International Semantic Web Conference. Springer, 363-378. Google ScholarDigital Library
- Minh Pham, Suresh Alse, Craig Knoblock, and Pedro Szekely. 2016. Semantic labeling: A domain-independent approach. In ISWC 2016 - 15th International Semantic Web Conference.Google ScholarDigital Library
- Erhard Rahm and Philip A Bernstein. 2001. A survey of approaches to automatic schema matching. the VLDB Journal10, 4 (2001), 334-350. Google ScholarDigital Library
- S.K. Ramnandan, Amol Mittal, Craig A. Knoblock, and Pedro Szekely. 2015. Assigning Semantic Labels to Data Sources. In Proceedings of the 12th ESWC 2015. Google ScholarDigital Library
- Sashank J. Reddi, Satyen Kale, and Sanjiv Kumar. 2018. On the Convergence of Adam and Beyond. In ICLR 2018 : International Conference on Learning Representations 2018.Google Scholar
- Natalia Rümmele, Yuriy Tyshetskiy, and Alex Collins. 2018. Evaluating approaches for supervised semantic labeling.(2018).Google Scholar
- Satya S. Sahoo, Juan Sequeda, and Ahmed Ezzat. 2009. A Survey of Current Approaches for Mapping of Relational Databases to RDF. (2009).Google Scholar
- Hassan A Sleiman and Rafael Corchuelo. 2014. Trinity: on using trinary trees for unsupervised web data extraction. IEEE Transactions on Knowledge and Data Engineering26, 6(2014), 1544-1556. Google ScholarDigital Library
- Jason Slepicka, Chengye Yin, Pedro Szekely, and Craig A. Knoblock. 2015. KR2RML: An Alternative Interpretation of R2RML for Heterogenous Sources. In Proceedings of the 6th International Workshop on Consuming Linked Data (COLD 2015).Google Scholar
- Charles A. Sutton and Andrew McCallum. 2012. An Introduction to Conditional Random Fields. arXiv preprint arXiv:1011.40884, 4 (2012), 267-373. Google ScholarDigital Library
- Pedro Szekely, Craig A. Knoblock, Jason Slepicka, Andrew Philpot, Amandeep Singh, Chengye Yin, Dipsy Kapoor, Prem Natarajan, Daniel Marcu, Kevin Knight, David Stallard, Subessware S. Karunamoorthy, Rajagopal Bojanapalli, Steven Minton, Brian Amanatullah, Todd Hughes, Mike Tamayo, David Flynt, Rachel Artiss, Shih-Fu Chang, Tao Chen, Gerald Hiebel, and Lidia Ferreira. 2015. Building and Using a Knowledge Graph to Combat Human Trafficking. In Proceedings of the 14th International Semantic Web Conference (ISWC 2015).Google ScholarDigital Library
- Mohsen Taheriyan, Craig Knoblock, Pedro Szekely, and Jose Luis Ambite. 2016. Leveraging Linked Data to Discover Semantic Relations within Data Sources. In ISWC 2016 - 15th International Semantic Web Conference.Google ScholarDigital Library
- Mohsen Taheriyan, Craig A. Knoblock, Pedro Szekely, and Jose Luis Ambite. 2016. Learning the semantics of structured data sources. Journal of Web Semantics(2016). Google ScholarDigital Library
- Diego De U&ntiled;a, Nataliia Rümmele, Graeme Gange, Peter Schachte, Peter J. Stuckey, and Peter J. Stuckey. 2018. Machine Learning and Constraint Programming for Relational-To-Ontology Schema Mapping. In IJCAI 2018: 27th International Joint Conference on Artificial Intelligence. 1277-1283. Google ScholarDigital Library
- Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Pasca, Warren Shen, Fei Wu, Gengxin Miao, and Chung Wu. 2011. Recovering semantics of tables on the web. Proceedings of the VLDB Endowment4, 9 (2011), 528-538. Google ScholarDigital Library
Recommendations
Learning the semantics of structured data sources
Information sources such as relational databases, spreadsheets, XML, JSON, and Web APIs contain a tremendous amount of structured data that can be leveraged to build and augment knowledge graphs. However, they rarely provide a semantic model to describe ...
SAND : A Tool for Creating Semantic Descriptions of Tabular Sources
The Semantic Web: ESWC 2022 Satellite EventsAbstractBuilding semantic descriptions of tables is a vital step in data integration. However, this task is expensive and time-consuming as users often need to examine the table data, its metadata, and ontologies to find the most appropriate description. ...
Linking and Negotiating Uncertainty Theories Over Linked Data
WWW '19: Companion Proceedings of The 2019 World Wide Web ConferenceThere is no credibility insurance measure for the information provided by the Web. In most cases, information cannot be checked for accuracy. Semantic Web technologies aimed to give structure and sense to information published on the Web and to provide ...
Comments