skip to main content
10.1145/1376616.1376700acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Interactive generation of integrated schemas

Published:09 June 2008Publication History

ABSTRACT

Schema integration is the problem of creating a unified target schema based on a set of existing source schemas that relate to each other via specified correspondences. The unified schema gives a standard representation of the data, thus offering a way to deal with the heterogeneity in the sources. In this paper, we develop a method and a design tool that provide: 1) adaptive enumeration of multiple interesting integrated schemas, and 2) easy-to-use capabilities for refining the enumerated schemas via user interaction. Our method is a departure from previous approaches to schema integration, which do not offer a systematic exploration of the possible integrated schemas.

The method operates at a logical level, where we recast each source schema into a graph of concepts with Has-A relationships. We then identify matching concepts in different graphs by taking into account the correspondences between their attributes. For every pair of matching concepts, we have two choices: merge them into one integrated concept or keep them as separate concepts. We develop an algorithm that can systematically output, without duplication, all possible integrated schemas resulting from the previous choices. For each integrated schema, the algorithm also generates a mapping from the source schemas to the integrated schema that has precise information-preserving properties. Furthermore, we avoid a full enumeration, by allowing users to specify constraints on the merging process, based on the schemas produced so far. These constraints are then incorporated in the enumeration of the subsequent schemas. The result is an adaptive and interactive enumeration method that significantly reduces the space of alternative schemas, and facilitates the selection of the final integrated schema.

References

  1. S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison Wesley Publishing Co, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. Batini, M. Lenzerini, and S. B. Navathe. A Comparative Analysis of Methodologies for Database Schema Integration. ACM Computing Surveys, 18(4):323--364, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. A. Bernstein. Applying model management to classical meta data problems. In CIDR, pages 209--220, 2003.Google ScholarGoogle Scholar
  4. P. A. Bernstein and S. Melnik. Model Management 2.0: Manipulating Richer Mappings. In SIGMOD, pages 1--12, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. www.biosql.org/.Google ScholarGoogle Scholar
  6. P. Buneman, S. B. Davidson, and A. Kosky. Theoretical Aspects of Schema Merging. In EDBT, pages 152--167, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. L. Chiticariu, M. A. Hernández, P. G. Kolaitis, and L. Popa. Semi-Automatic Schema Integration in Clio. In VLDB, pages 1326--1329, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. N. Creignou and J. Hébrard. On generating all solutions of generalized satisfiability problems. ITA, 31(6):499--511, 1997.Google ScholarGoogle Scholar
  9. www.gusdb.org/.Google ScholarGoogle Scholar
  10. L. M. Haas, M. A. Hernandez, H. Ho, L. Popa, and M. Roth. Clio Grows Up: From Research Prototype to Industrial Tool. In SIGMOD, pages 805--810, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. S. Johnson, C. H. Papadimitriou, and M. Yannakakis. On generating all maximal independent sets. Information Processing Letters, 27(3):119--123, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Lenzerini. Data Integration: A Theoretical Perspective. In PODS, pages 233--246, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. J. Miller, D. Fisla, M. Huang, D. Kymlicka, F. Ku, and V. Lee. The Amalgam schema and data integration test suite. www.cs.toronto.edu/ miller/amalgam, 2001.Google ScholarGoogle Scholar
  14. R. J. Miller, Y. E. Ioannidis, and R. Ramakrishnan. The Use of Information Capacity in Schema Integration and Translation. In VLDB, pages 120--133, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. www.dbis.informatik.uni-goettingen.de/Mondial.Google ScholarGoogle Scholar
  16. N. F. Noy and M. A. Musen. PROMPT: Algorithm and Tool for Automated Ontology Merging and Alignment. In AAAI/IAAI, pages 450--455, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. L. Popa, Y. Velegrakis, R. J. Miller, M. A. Hernández, and R. Fagin. Translating Web Data. In VLDB, pages 598--609, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. Pottinger and P. A. Bernstein. Merging Models Based on Given Correspondences. In VLDB, pages 826--873, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. Pottinger and P. A. Bernstein. Schema Merging and Mapping Creation for Relational Sources. In EDBT (To appear), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. E. Rahm and P. A. Bernstein. A survey of approaches to automatic schema matching. VLDB Journal, 10(4):334--350, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Spaccapietra and C. Parent. View Integration: A Step Forward in Solving Structural Conflicts. TKDE, 6(2):258--274, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. G. Stumme and A. Maedche. FCA-MERGE: Bottom-up merging of ontologies. In IJCAI, pages 225--234, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. O. Udrea, L. Getoor, and R. J. Miller. Leveraging Data and Structure in Ontology Integration. In SIGMOD, pages 449--460, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Interactive generation of integrated schemas

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data
            June 2008
            1396 pages
            ISBN:9781605581026
            DOI:10.1145/1376616

            Copyright © 2008 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 9 June 2008

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate785of4,003submissions,20%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader