ABSTRACT
While schema mapping specification is a cumbersome task for data curation specialists, it becomes unfeasible for non-expert users, who are unacquainted with the semantics and languages of the involved transformations.
In this paper, we present an interactive framework for schema mapping specification suited for non-expert users. The underlying key intuition is to leverage a few exemplar tuples to infer the underlying mappings and iterate the inference process via simple user interactions under the form of boolean queries on the validity of the initial exemplar tuples. The approaches available so far are mainly assuming pairs of complete universal data examples, which can be solely provided by data curation experts, or are limited to poorly expressive mappings.
We present several exploration strategies of the space of all possible mappings that satisfy arbitrary user exemplar tuples. Along the exploration, we challenge the user to retain the mappings that fit the user's requirements at best and to dynamically prune the exploration space, thus reducing the number of user interactions. We prove that after the refinement process, the obtained mappings are correct. We present an extensive experimental analysis devoted to measure the feasibility of our interactive mapping strategies and the inherent quality of the obtained mappings.
- A. Abouzied, D. Angluin, C. H. Papadimitriou, J. M. Hellerstein, and A. Silberschatz. Learning and verifying quantified boolean queries by example. In Proceedings of PODS, pages 49--60, 2013. Google ScholarDigital Library
- A. Abouzied, J. M. Hellerstein, and A. Silberschatz. Playful query specification with dataplay. PVLDB, 5(12):1938--1941, 2012. Google ScholarDigital Library
- R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In VLDB'94, pages 487--499, 1994. Google ScholarDigital Library
- B. Alexe, B. T. Cate, P. G. Kolaitis, and W.-C. Tan. Characterizing schema mappings via data examples. TODS, 36(4):23:1--23:48, 2011. Google ScholarDigital Library
- B. Alexe, L. Chiticariu, R. J. Miller, and W. C. Tan. Muse: Mapping understanding and design by example. In Proceedings of the ICDE, pages 10--19, 2008. Google ScholarDigital Library
- B. Alexe, B. ten Cate, P. G. Kolaitis, and W. C. Tan. Designing and refining schema mappings via data examples. In Proceedings of SIGMOD, pages 133--144, 2011. Google ScholarDigital Library
- B. Alexe, B. Ten Cate, P. G. Kolaitis, and W.-C. Tan. Eirene: Interactive design and refinement of schema mappings via data examples. Proceedings of VLDB, 2011.Google Scholar
- D. Angluin. Queries and concept learning. Machine Learning, 2(4):319--342, 1987. Google ScholarDigital Library
- P. C. Arocena, B. Glavic, R. Ciucanu, and R. J. Miller. The ibench integration metadata generator. Proceedings of VLDB, 9(3):108--119, 2015. Google ScholarDigital Library
- C. Beeri and M. Y. Vardi. A proof procedure for data dependencies. JACM, 31(4):718--741, 1984. Google ScholarDigital Library
- Z. Bellahsene, A. Bonifati, and E. Rahm, editors. Schema Matching and Mapping. Data-Centric Systems and Applications. Springer, 2011. Google ScholarDigital Library
- P. A. Bernstein and S. Melnik. Model management 2.0: Manipulating richer mappings. In SIGMOD, 2007. Google ScholarDigital Library
- A. Bonifati, R. Ciucanu, and S. Staworko. Learning join queries from user examples. ACM Trans. Database Syst., 40(4):24:1--24:38, Jan. 2016. Google ScholarDigital Library
- B. T. Cate, V. Dalmau, and P. G. Kolaitis. Learning schema mappings. ACM TODS, 38(4):28, 2013. Google ScholarDigital Library
- L. Chiticariu and W.-C. Tan. Debugging schema mappings with routes. In Proceedings of the 32nd international conference on Very large data bases, pages 79--90. VLDB Endowment, 2006. Google ScholarDigital Library
- G. I. Diaz, M. Arenas, and M. Benedikt. Sparqlbye: Querying RDF data by example. PVLDB, 9(13):1533--1536, 2016. Google ScholarDigital Library
- R. Fagin, P. G. Kolaitis, R. J. Miller, and L. Popa. Data exchange: semantics and query answering. Theoretical Computer Science, 336(1):89--124, 2005. Google ScholarDigital Library
- M. J. Franklin, A. Y. Halevy, and D. Maier. A first tutorial on dataspaces. PVLDB, 1(2):1516--1517, 2008. Google ScholarDigital Library
- B. Glavic, G. Alonso, R. J. Miller, and L. M. Haas. Tramp Understanding the behavior of schema mappings through provenance. Proc. VLDB Endow., 3(1--2):1314--1325, Sept. 2010. Google ScholarDigital Library
- B. Glavic, J. Du, R. J. Miller, G. Alonso, and L. M. Haas. Debugging data exchange with vagabond. PVLDB, 4(12):1383--1386, 2011.Google ScholarDigital Library
- G. Gottlob, R. Pichler, and V. Savenkov. Normalization and optimization of schema mappings. VLDB J., 20(2):277--302, 2011. Google ScholarDigital Library
- G. Gottlob and P. Senellart. Schema mapping discovery from data instances. Journal of the ACM(JACM), 57(2):6, 2010. Google ScholarDigital Library
- H. V. Jagadish, A. Chapman, A. Elkiss, M. Jayapandian, Y. Li, A. Nandi, and C. Yu. Making database systems usable. In Proceedings of SIGMOD, pages 13--24, 2007. Google ScholarDigital Library
- D. Mottin, M. Lissandrini, Y. Velegrakis, and T. Palpanas. Exemplar queries: Give me an example of what you need. PVLDB, 7(5):365--376, 2014. Google ScholarDigital Library
- L. Popa, Y. Velegrakis, M. A. Hernández, R. J. Miller, and R. Fagin. Translating web data. In Proceedings of VLDB, pages 598--609, 2002. Google ScholarDigital Library
- L. Qian, M. J. Cafarella, and H. Jagadish. Sample-driven schem mapping. In Proceedings of SIGMOD, pages 73--84. ACM, 2012. Google ScholarDigital Library
- P. Shvaiko and J. Euzenat. A survey of schema-based matching approaches. Journal on Data Semantics, pages 146--171, 2005. Google ScholarDigital Library
- B. ten Cate, P. G. Kolaitis, K. Qian, and W.-C. Tan. Approximation algorithms for schema-mapping discovery from data examples. In Alberto Mendelzon International Workshop on Foundations of Data Management, page 24, 2015.Google Scholar
- B. Ten Cate, P. G. Kolaitis, and W.-C. Tan. Database constraints and homomorphism dualities. In CP. Springer, 2010. Google ScholarDigital Library
- L. G. Valiant. A theory of the learnable. Commun. ACM, 27(11):1134--1142, Nov. 1984. Google ScholarDigital Library
- L. Yan, R. J. Miller, L. M. Haas, and R. Fagin. Data-drive understanding and refinement of schema mappings. In Proceedings of SIGMOD, pages 485--496, 2001 Google ScholarDigital Library
Index Terms
- Interactive Mapping Specification with Exemplar Tuples
Recommendations
Interactive Mapping Specification with Exemplar Tuples
Best of EDBT 2017 and Regular PapersWhile schema mapping specification is a cumbersome task for data curation specialists, it becomes unfeasible for non-expert users, who are unacquainted with the semantics and languages of the involved transformations.
In this article, we present an ...
Towards a theory of schema-mapping optimization
PODS '08: Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsA schema mapping is a high-level specification that describes the relationship between two database schemas. As schema mappings constitute the essential building blocks of data exchange and data integration, an extensive investigation of the foundations ...
Structural characterizations of schema-mapping languages
ICDT '09: Proceedings of the 12th International Conference on Database TheorySchema mappings are declarative specifications that describe the relationship between two database schemas. In recent years, there has been an extensive study of schema mappings and of their applications to several different data inter-operability tasks,...
Comments