skip to main content
10.1145/3097983.3105809acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
invited-talk

The Future of Data Integration

Published: 04 August 2017 Publication History

Abstract

The value of data explodes when it is integrated. In this talk, I present some important innovations in data integration over the last two decades. These include data exchange [1], which provides a foundation for reasoning about the correctness of transformed data, and the use of declarative mappings in integration [2]. I discuss how data mining has been used to facilitate data integration, including constraint discovery [3], mapping discovery [4], and in schema discovery to combat database decay and facilitate integration [5,6]. I present some important new data integration challenges that arise in data science. These include the use of mining for query and visualization recommendation over massive data lakes [7] and data set search, finding datasets of interest at interactive speeds [8].

References

[1]
R. Fagin, Ph. G. Kolaitis, R. J. Miller, L. Popa. Data Exchange: Semantics and Query Answering. Theoretical Computer Science, 336(1):89--124, May 2005.
[2]
R. Fagin, L. M. Haas, M. A. Hernandez, R. J. Miller, L. Popa, Y. Velegrakis. Clio: Schema Mapping Creation and Data Exchange. Conceptual Modelling: Foundations & Applications, 198--236, 2009.
[3]
F. Chiang and R. J. Miller, Discovering Data Quality Rules. PVLDB 1(1):1166--1177, 2008.
[4]
A. Kimmig, A. Memory, R. J. Miller, L. Getoor. A Collective Probabilistic Approach to Schema Mapping Discovery. IEEE ICDE, 921--932, 2017.
[5]
R. J. Miller and P. Andritsos. Schema Discovery. IEEE Data Engineering Bulletin, 26(3):40--45, 2003.
[6]
P. Andritsos, R. J. Miller, P. Tsaparas. Information-Theoretic Tools for Mining Database Structure from Large Data Sets. ACM SIGMOD, 33(2):731--742, 2004.
[7]
E. Kandogan, M. Roth, P. M. Schwarz, J. Hui, I. G. Terrizzano, C. Christodoulakis, R. J. Milller. LabBook: Metadata-Driven Social Collaborative Data Analysis. IEEE Big Data, 431--440, 2015.
[8]
E. Zhu, F. Nargesian, K. Q. Pu, R. J. Miller. LSH Ensemble: Internet-Scale Domain Search. PVLDB, 9(12):1185--1196 2016.

Cited By

View all
  • (2020)Py_ape: Text Data Acquiring, Extracting, Cleaning and Schema Matching in PythonFuture Data and Security Engineering. Big Data, Security and Privacy, Smart City and Industry 4.0 Applications10.1007/978-981-33-4370-2_6(78-89)Online publication date: 19-Nov-2020
  • (2019)Getting Rid of DataJournal of Data and Information Quality10.1145/332692012:1(1-7)Online publication date: 11-Nov-2019
  • (2019)Human-Centered Study of Data Science Work PracticesExtended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems10.1145/3290607.3299018(1-8)Online publication date: 2-May-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 2017
2240 pages
ISBN:9781450348874
DOI:10.1145/3097983
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 August 2017

Check for updates

Author Tag

  1. data science

Qualifiers

  • Invited-talk

Conference

KDD '17
Sponsor:

Acceptance Rates

KDD '17 Paper Acceptance Rate 64 of 748 submissions, 9%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)4
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2020)Py_ape: Text Data Acquiring, Extracting, Cleaning and Schema Matching in PythonFuture Data and Security Engineering. Big Data, Security and Privacy, Smart City and Industry 4.0 Applications10.1007/978-981-33-4370-2_6(78-89)Online publication date: 19-Nov-2020
  • (2019)Getting Rid of DataJournal of Data and Information Quality10.1145/332692012:1(1-7)Online publication date: 11-Nov-2019
  • (2019)Human-Centered Study of Data Science Work PracticesExtended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems10.1145/3290607.3299018(1-8)Online publication date: 2-May-2019
  • (2019)How Data Science Workers Work with DataProceedings of the 2019 CHI Conference on Human Factors in Computing Systems10.1145/3290605.3300356(1-15)Online publication date: 2-May-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media