skip to main content
10.1145/2723372.2750544acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Open access

Identifying the Extent of Completeness of Query Answers over Partially Complete Databases

Published: 27 May 2015 Publication History

Abstract

In many applications including loosely coupled cloud databases, collaborative editing and network monitoring, data from multiple sources is regularly used for query answering. For reasons such as system failures, insufficient author knowledge or network issues, data may be temporarily unavailable or generally nonexistent. Hence, not all data needed for query answering may be available.
In this paper, we propose a natural class of completeness patterns, expressed by selections on database tables, to specify complete parts of database tables. We then show how to adapt the operators of relational algebra so that they manipulate these completeness patterns to compute completeness patterns pertaining to query answers. Our proposed algebra is computationally sound and complete with respect to the information that the patterns provide. We show that stronger completeness patterns can be obtained by considering not only the schema but also the database instance and we extend the algebra to take into account this additional information. We develop novel techniques to efficiently implement the computation of completeness patterns on query answers and demonstrate their scalability on real data.

References

[1]
S. Abiteboul, P. Kanellakis, and G. Grahne. On the representation and querying of sets of possible worlds. In SIGMOD, pages 34--48, 1987.
[2]
A. C. Acock. Working with missing values. Journal of Marriage and Family, 67(4):1012--1028, 2005.
[3]
S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives. Dbpedia: A nucleus for a web of open data. Springer, 2007.
[4]
L. Berti-Equille, T. Dasu, and D. Srivastava. Discovery of complex glitch patterns: A novel approach to quantitative data cleaning. In ICDE, pages 733--744. IEEE, 2011.
[5]
N. Bruno and S. Chaudhuri. Flexible database generators. In VLDB, pages 1097--1107, 2005.
[6]
K. L. Clark. Negation as failure. In Logic and data bases, pages 293--322. Springer, 1978.
[7]
A. Cortés-Calabuig, M. Denecker, O. Arieli, and M. Bruynooghe. Representation of partial knowledge and query answering in locally complete databases. In Logic for Programming, Artificial Intelligence, and Reasoning, pages 407--421, 2006.
[8]
A. Crolotte and A. Ghazal. Introducing skew into the TPC-H benchmark. In TPCTC, pages 137--145, 2011.
[9]
M. Denecker, A. Cortés-Calabuig, M. Bruynooghe, and O. Arieli. Towards a logical reconstruction of a theory for locally closed databases. ACM TODS, 35(3), 2010.
[10]
W. Fan and F. Geerts. Relative information completeness. In PODS, pages 97--106, 2009.
[11]
W. Fan and F. Geerts. Capturing missing tuples and missing values. In PODS, pages 169--178, 2010.
[12]
L. Golab and T. Johnson. Consistency in a stream warehouse. In CIDR, pages 114--122, 2011.
[13]
J. M. Hellerstein. Quantitative data cleaning for large databases. United Nations Economic Commission for Europe (UNECE), 2008.
[14]
C. R. Kalmanek, Z. Ge, S. Lee, C. Lund, D. Pei, J. Seidel, J. Van der Merwe, and J. Yates. Darkstar: Using exploratory data mining to raise the bar on network reliability and performance. In DRCN, pages 1--10. IEEE, 2009.
[15]
W. Lang, R. V. Nehme, and I. Rae. Database optimization for the cloud: Where costs, partial results, and consumer choice meet. In CIDR, 2015.
[16]
W. Lang, R. V. Nehme, E. Robinson, and J. F. Naughton. Partial results in database systems. In SIGMOD, pages 1275--1286. ACM, 2014.
[17]
A. Y. Levy. Obtaining complete answers from incomplete databases. In VLDB, pages 402--412, 1996.
[18]
L. Libkin. Incomplete data: what went wrong, and how to fix it. In PODS, 2014, pages 1--13, 2014.
[19]
D. Maier and P. A. Tucker. Punctuations. In Encyclopedia of Database Systems, pages 2216--2217. Springer, 2009.
[20]
W. McCune. Experiments with discrimination-tree indexing and path indexing for term retrieval. Journal of Automated Reasoning, 9(2):147--167, 1992.
[21]
A. Motro. Integrity = Validity Completeness. ACM TODS, 14(4):480--502, 1989.
[22]
S. Razniewski and W. Nutt. Completeness of queries over incomplete databases. In VLDB, 2011.
[23]
R. Reiter. On closed world data bases. Springer, 1978.
[24]
P. Royston. Multiple imputation of missing values. Stata Journal, 4:227--241, 2004.
[25]
V. Shkapenyuk, T. Johnson, and D. Srivastava. Bistro data feed management system. In SIGMOD, pages 1059--1070. ACM, 2011.
[26]
M. Yakout, L. Berti-Équille, and A. K. Elmagarmid. Don't be scared: use scalable automatic repairing with maximal likelihood and bounded changes. In SIGMOD, pages 553--564. ACM, 2013.

Cited By

View all
  • (2025)Can’t you answer while you wait?Annals of Mathematics and Artificial Intelligence10.1007/s10472-025-09968-8Online publication date: 17-Jan-2025
  • (2024)Hypothetical Answers to Continuous Queries Over Data StreamsACM Transactions on Computational Logic10.1145/368884525:4(1-40)Online publication date: 17-Aug-2024
  • (2024)Completeness, Recall, and Negation in Open-world Knowledge Bases: A SurveyACM Computing Surveys10.1145/363956356:6(1-42)Online publication date: 5-Jan-2024
  • Show More Cited By

Index Terms

  1. Identifying the Extent of Completeness of Query Answers over Partially Complete Databases

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data
    May 2015
    2110 pages
    ISBN:9781450327589
    DOI:10.1145/2723372
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 May 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. data completeness
    2. data quality

    Qualifiers

    • Research-article

    Conference

    SIGMOD/PODS'15
    Sponsor:
    SIGMOD/PODS'15: International Conference on Management of Data
    May 31 - June 4, 2015
    Victoria, Melbourne, Australia

    Acceptance Rates

    SIGMOD '15 Paper Acceptance Rate 106 of 415 submissions, 26%;
    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)163
    • Downloads (Last 6 weeks)31
    Reflects downloads up to 08 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Can’t you answer while you wait?Annals of Mathematics and Artificial Intelligence10.1007/s10472-025-09968-8Online publication date: 17-Jan-2025
    • (2024)Hypothetical Answers to Continuous Queries Over Data StreamsACM Transactions on Computational Logic10.1145/368884525:4(1-40)Online publication date: 17-Aug-2024
    • (2024)Completeness, Recall, and Negation in Open-world Knowledge Bases: A SurveyACM Computing Surveys10.1145/363956356:6(1-42)Online publication date: 5-Jan-2024
    • (2022)Découverte de cardinalités maximales significatives dans des bases de connaissancesRevue Ouverte d'Intelligence Artificielle10.5802/roia.303:3-4(223-251)Online publication date: 8-Apr-2022
    • (2022)Enhancing Query Answer Completeness with Query Expansion based on Synonym PredicatesCompanion Proceedings of the Web Conference 202210.1145/3487553.3524198(354-358)Online publication date: 25-Apr-2022
    • (2022)Reconciling Communication Delays and NegationTheoretical Aspects of Computing – ICTAC 202210.1007/978-3-031-17715-6_11(151-169)Online publication date: 3-Oct-2022
    • (2022)Can You Answer While You Wait?Foundations of Information and Knowledge Systems10.1007/978-3-031-11321-5_7(111-129)Online publication date: 20-Jun-2022
    • (2022)Aggregate Query Result Correctness Using Pattern TablesDatabase Systems for Advanced Applications. DASFAA 2022 International Workshops10.1007/978-3-031-11217-1_25(347-362)Online publication date: 11-Apr-2022
    • (2021)WikinegataProceedings of the VLDB Endowment10.14778/3476311.347635014:12(2807-2810)Online publication date: 1-Jul-2021
    • (2021)Ensuring Data Readiness for Quality Requirements with Help from Procedure ReuseJournal of Data and Information Quality10.1145/342815413:3(1-15)Online publication date: 27-Apr-2021
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media