skip to main content
10.1145/2949689.2949702acmotherconferencesArticle/Chapter ViewAbstractPublication PagesssdbmConference Proceedingsconference-collections
poster

Data Exchange with MapReduce: A First Cut

Published: 18 July 2016 Publication History

Abstract

Data exchange is one of the oldest database problems, being of both practical and theoretical interest. Given the pace at which heterogeneous data are published on the web, thanks to initiatives such as Linked Data and Open Science, scalability of data exchange becomes crucial. Pivotal to data exchange is the chase algorithm, which is a fixpoint algorithm to evaluate both source-to-target constraints and target constraints in the data exchange process. In this paper, we investigate how new programming models such as MapReduce can be used to implement the chase on large-scale data sources. To the best of our knowledge, how to exchange data at scale has not been investigated so far. We present an initial solution for chasing source-to-target tuple generating dependencies and target tuple-generating dependencies, and discuss open issues that need to be addressed to leverage MapReduce for the data exchange problem.

References

[1]
F. N. Afrati and J. D. Ullman. Optimizing joins in a map-reduce environment. In EDBT, pages 99--110, 2010.
[2]
F. N. Afrati and J. D. Ullman. Transitive closure and recursive datalog implemented on clusters. In EDBT, pages 132--143, 2012.
[3]
M. Armbrust, E. Liang, T. Kraska, A. Fox, M. J. Franklin, and D. A. Patterson. Generalized scale independence through incremental precomputation. In SIGMOD, pages 625--636, 2013.
[4]
G. Demartini, D. E. Difallah, and P. Cudré-Mauroux. Large-scale linked data integration using probabilistic reasoning and crowdsourcing. The VLDB Journal, 22(5):665--687, 2013.
[5]
R. Fagin, P. G. Kolaitis, R. J. Miller, and L. Popa. Data exchange: semantics and query answering. Theor. Comput. Sci., 336(1):89--124, 2005.
[6]
W. Fan, F. Geerts, and L. Libkin. On scale independence for querying big data. In PODS, pages 51--62, 2014.
[7]
M. Franklin, A. Halevy, and D. Maier. From databases to dataspaces: A new abstraction for information management. SIGMOD Rec., 34(4):27--33, 2005.
[8]
M. Friedman, A. Halevy, and T. Millstein. Navigational plans for data integration. In AAAI/IAAI, pages 67--73, 1999.
[9]
F. Goasdoué, Z. Kaoudi, I. Manolescu, J.-A. Quiané-Ruiz, and S. Zampetakis. Cliquesquare: Flat plans for massively parallel RDF queries. In ICDE, pages 771--782, 2015.
[10]
R. Kabler, Y. E. Ioannidis, and M. J. Carey. Performance evaluation of algorithms for transitive closure. Inf. Syst., 17(5):415--441, 1992.
[11]
P. G. Kolaitis, J. Panttaja, and W. C. Tan. The complexity of data exchange. In Proceedings of PODS, pages 30--39, 2006.
[12]
J. Leskovec, A. Rajaraman, and J. D. Ullman. Mining of Massive Datasets, 2nd Ed. Cambridge Univ. Press, 2014.
[13]
S. Wu, F. Li, S. Mehrotra, and B. C. Ooi. Query optimization for massively parallel data processing. In ACM SOCC, pages 1--13, 2011.
[14]
X. Zhang, L. Chen, and M. Wang. Efficient multi-way theta-join processing using mapreduce. PVLDB, 5(11):1184--1195, 2012.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
SSDBM '16: Proceedings of the 28th International Conference on Scientific and Statistical Database Management
July 2016
290 pages
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 July 2016

Check for updates

Author Tags

  1. Chase
  2. Data exchange
  3. MapReduce

Qualifiers

  • Poster
  • Research
  • Refereed limited

Conference

SSDBM '16

Acceptance Rates

Overall Acceptance Rate 56 of 146 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 85
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media