ABSTRACT
Schema integration is the problem of creating a unified target schema based on a set of existing source schemas that relate to each other via specified correspondences. The unified schema gives a standard representation of the data, thus offering a way to deal with the heterogeneity in the sources. In this paper, we develop a method and a design tool that provide: 1) adaptive enumeration of multiple interesting integrated schemas, and 2) easy-to-use capabilities for refining the enumerated schemas via user interaction. Our method is a departure from previous approaches to schema integration, which do not offer a systematic exploration of the possible integrated schemas.
The method operates at a logical level, where we recast each source schema into a graph of concepts with Has-A relationships. We then identify matching concepts in different graphs by taking into account the correspondences between their attributes. For every pair of matching concepts, we have two choices: merge them into one integrated concept or keep them as separate concepts. We develop an algorithm that can systematically output, without duplication, all possible integrated schemas resulting from the previous choices. For each integrated schema, the algorithm also generates a mapping from the source schemas to the integrated schema that has precise information-preserving properties. Furthermore, we avoid a full enumeration, by allowing users to specify constraints on the merging process, based on the schemas produced so far. These constraints are then incorporated in the enumeration of the subsequent schemas. The result is an adaptive and interactive enumeration method that significantly reduces the space of alternative schemas, and facilitates the selection of the final integrated schema.
- S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison Wesley Publishing Co, 1995. Google ScholarDigital Library
- C. Batini, M. Lenzerini, and S. B. Navathe. A Comparative Analysis of Methodologies for Database Schema Integration. ACM Computing Surveys, 18(4):323--364, 1986. Google ScholarDigital Library
- P. A. Bernstein. Applying model management to classical meta data problems. In CIDR, pages 209--220, 2003.Google Scholar
- P. A. Bernstein and S. Melnik. Model Management 2.0: Manipulating Richer Mappings. In SIGMOD, pages 1--12, 2007. Google ScholarDigital Library
- www.biosql.org/.Google Scholar
- P. Buneman, S. B. Davidson, and A. Kosky. Theoretical Aspects of Schema Merging. In EDBT, pages 152--167, 1992. Google ScholarDigital Library
- L. Chiticariu, M. A. Hernández, P. G. Kolaitis, and L. Popa. Semi-Automatic Schema Integration in Clio. In VLDB, pages 1326--1329, 2007. Google ScholarDigital Library
- N. Creignou and J. Hébrard. On generating all solutions of generalized satisfiability problems. ITA, 31(6):499--511, 1997.Google Scholar
- www.gusdb.org/.Google Scholar
- L. M. Haas, M. A. Hernandez, H. Ho, L. Popa, and M. Roth. Clio Grows Up: From Research Prototype to Industrial Tool. In SIGMOD, pages 805--810, 2005. Google ScholarDigital Library
- D. S. Johnson, C. H. Papadimitriou, and M. Yannakakis. On generating all maximal independent sets. Information Processing Letters, 27(3):119--123, 1988. Google ScholarDigital Library
- M. Lenzerini. Data Integration: A Theoretical Perspective. In PODS, pages 233--246, 2002. Google ScholarDigital Library
- R. J. Miller, D. Fisla, M. Huang, D. Kymlicka, F. Ku, and V. Lee. The Amalgam schema and data integration test suite. www.cs.toronto.edu/ miller/amalgam, 2001.Google Scholar
- R. J. Miller, Y. E. Ioannidis, and R. Ramakrishnan. The Use of Information Capacity in Schema Integration and Translation. In VLDB, pages 120--133, 1993. Google ScholarDigital Library
- www.dbis.informatik.uni-goettingen.de/Mondial.Google Scholar
- N. F. Noy and M. A. Musen. PROMPT: Algorithm and Tool for Automated Ontology Merging and Alignment. In AAAI/IAAI, pages 450--455, 2000. Google ScholarDigital Library
- L. Popa, Y. Velegrakis, R. J. Miller, M. A. Hernández, and R. Fagin. Translating Web Data. In VLDB, pages 598--609, 2002. Google ScholarDigital Library
- R. Pottinger and P. A. Bernstein. Merging Models Based on Given Correspondences. In VLDB, pages 826--873, 2003. Google ScholarDigital Library
- R. Pottinger and P. A. Bernstein. Schema Merging and Mapping Creation for Relational Sources. In EDBT (To appear), 2008. Google ScholarDigital Library
- E. Rahm and P. A. Bernstein. A survey of approaches to automatic schema matching. VLDB Journal, 10(4):334--350, 2001. Google ScholarDigital Library
- S. Spaccapietra and C. Parent. View Integration: A Step Forward in Solving Structural Conflicts. TKDE, 6(2):258--274, 1994. Google ScholarDigital Library
- G. Stumme and A. Maedche. FCA-MERGE: Bottom-up merging of ontologies. In IJCAI, pages 225--234, 2001. Google ScholarDigital Library
- O. Udrea, L. Getoor, and R. J. Miller. Leveraging Data and Structure in Ontology Integration. In SIGMOD, pages 449--460, 2007. Google ScholarDigital Library
Index Terms
- Interactive generation of integrated schemas
Recommendations
Top-k generation of integrated schemas based on directed and weighted correspondences
SIGMOD '09: Proceedings of the 2009 ACM SIGMOD International Conference on Management of dataSchema integration is the problem of creating a unified target schema based on a set of existing source schemas and based on a set of correspondences that are the result of matching the source schemas. Previous methods for schema integration rely on the ...
Mapping DTDs to relational schemas with semantic constraints
XML is becoming a prevalent format and standard for data exchange in many applications. With the increase of XML data, there is an urgent need to research some efficient methods to store and manage XML data. As relational databases are the primary ...
Matching large schemas: Approaches and evaluation
Current schema matching approaches still have to improve for large and complex Schemas. The large search space increases the likelihood for false matches as well as execution times. Further difficulties for Schema matching are posed by the high ...
Comments