ABSTRACT
Managing inconsistency in databases has long been recognized as an important problem. One of the most promising approaches to coping with inconsistency in databases is the framework of database repairs, which has been the topic of an extensive investigation over the past several years. Intuitively, a repair of an inconsistent database is a consistent database that differs from the given inconsistent database in a minimal way. So far, most of the work in this area has addressed the problem of obtaining the consistent answers to a query posed on an inconsistent database. Repair checking is the following decision problem: given two databases r and r', is r' a repair of r? Although repair checking is a fundamental algorithmic problem about inconsistent databases, it has not received as much attention as consistent query answering. In this paper, we give a polynomial-time algorithm for subset-repair checking under integrity constraints that are the union of a weakly acyclic set of local-as-view (LAV) tuple-generating dependencies and a set of equality-generating dependencies. This result significantly generalizes earlier work for subset-repair checking when the integrity constraints are the union of an acyclic set of inclusion dependencies and a set of functional dependencies. We also give a polynomial-time algorithm for symmetric-difference repair checking, when the integrity constraints form a weakly acyclic set of LAV tgds. After this, we establish a number of complexity-theoretic results that delineate the boundary between tractability and intractability for the repair-checking problem. Specifically, we show that the aforementioned tractability results are optimal; in particular, subset-repair checking for arbitrary weakly acyclic sets of tuple-generating dependencies is a coNP-complete problem. We also study cardinality-based repairs and show that cardinality-repair checking is coNP-complete for various classes of integrity constraints encountered in database design and data exchange.
- M. Arenas, L. Bertossi, and J. Chomicki. Consistent query answers in inconsistent databases. In 18th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS'99), pages 68--79, 1999. Google ScholarDigital Library
- O. Benjelloun, H. Garcia-Molina, H. Gong, H. Kawai, T. E. Larson, D. Menestrina, and S. Thavisomboon. D-swoosh: A family of algorithms for generic, distributed entity resolution. In ICDCS, page 37, 2007. Google ScholarDigital Library
- L. E. Bertossi. Consistent query answering in databases. SIGMOD Record, 35(2):68--76, 2006. Google ScholarDigital Library
- L. E. Bertossi, L. Bravo, E. Franconi, and A. Lopatenko. Data cleansing for numerical data sets. In SEBD, pages 292--299, 2005.Google Scholar
- L. E. Bertossi, L. Bravo, E. Franconi, and A. Lopatenko. Fixing inconsistent databases by updating numerical attributes. In DEXA Workshops, pages 854--858, 2005.Google ScholarCross Ref
- L. E. Bertossi, L. Bravo, E. Franconi, and A. Lopatenko. Fixing inconsistent databases by updating numerical attributes. In DEXA Workshops, pages 854--858, 2005.Google ScholarCross Ref
- P. Bohannon, W. Fan, F. Geerts, X. Jia, and A. Kementsietsidis. Conditional functional dependencies for data cleaning. In ICDE, pages 746--755, 2007.Google ScholarCross Ref
- P. Bohannon, M. Flaster, W. Fan, and R. Rastogi. A cost-based model and effective heuristic for repairing constraints by value modification. In SIGMOD Conference, pages 143--154, 2005. Google ScholarDigital Library
- J. Chomicki. Consistent query answering: Five easy pieces. In ICDT, pages 1--17, 2007. Google ScholarDigital Library
- J. Chomicki and J. Marcinkowski. Minimal-change integrity maintenance using tuple deletions. Inf. Comput., 197(1/2):90--121, 2005.Google ScholarDigital Library
- A. Deutsch and V. Tannen. Reformulation of XML Queries and Constraints. In ICDT, pages 225--241, 2003. Google ScholarDigital Library
- W. Eckerson. Data quality and the bottom line: Achieving business success through a commitment to high quality data. Technical report, The Data Warehousing Institute, 2002. http://www.tdwi.org/research/display.aspx?ID=6064.Google Scholar
- R. Fagin, P. G. Kolaitis, R. J. Miller, and L. Popa. Data exchange: semantics and query answering. Theor. Comput. Sci., 336(1):89--124, 2005. Preliminary version in ICDT 2003. Google ScholarCross Ref
- E. Franconi, A. L. Palma, N. Leone, S. Perri, and F. Scarcello. Census data repair: a challenging application of disjunctive logic programming. In LPAR, pages 561--578, 2001. Google ScholarDigital Library
- A. Fuxman, P. G. Kolaitis, R. J. Miller, and W. C. Tan. Peer data exchange. ACM Trans. Database Syst., 31(4):1454--1498, 2006. Preliminary version in PODS 2005. Google ScholarDigital Library
- H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C.-A. Saita. Improving data cleaning quality using a data lineage facility. In DMDW, page 3, 2001.Google Scholar
- G. Gottlob and A. Nash. Data exchange: computing cores in polynomial time. In PODS, pages 40--49, 2006. Google ScholarDigital Library
- R. Greenlaw, H. J. Hoover, and W. L. Ruzzo. Limits to Parallel Computation: P-Completeness Theory. Oxford University Press, 1995. Google ScholarDigital Library
- M. A. Hernández and S. J. Stolfo. Real-world data is dirty: Data cleansing and the merge/purge problem. Data Min. Knowl. Discov., 2(1):9--37, 1998. Google ScholarDigital Library
- P. G. Kolaitis. Schema mappings, data exchange, and metadata management. In PODS, pages 61--75, 2005. Google ScholarDigital Library
- M. Lenzerini. Data Integration: A Theoretical Perspective. pages 233--246, 2002. Google ScholarDigital Library
- A. Lopatenko and L. E. Bertossi. Complexity of consistent query answering in databases under cardinality-based and incremental repair semantics. In ICDT, pages 179--193, 2007. Google ScholarDigital Library
- D. Menestrina, O. Benjelloun, and H. Garcia-Molina. Generic entity resolution with data confidences. In CleanDB, 2006.Google Scholar
- C. Moore and J. M. Robson. Hard tiling problems with simple tiles. Discrete and Computational Geometry, 26(4):573--590, 2001.Google ScholarDigital Library
- E. Rahm and H. Do. Data cleaning: Problems and current approaches. IEEE Data Engineering Bulletin, 23(4), 2000.Google Scholar
- V. Raman and J. M. Hellerstein. Potter's wheel: An interactive data cleaning system. In VLDB, pages 381--390, 2001. Google ScholarDigital Library
- S. Staworko. Declarative Inconsistency Handling in Relational and Semi-structured databases. PhD thesis, May 2007. Google ScholarDigital Library
- J. Wijsen. Database repairing using updates. ACM Trans. Database Syst., 30(3):722--768, 2005. Google ScholarDigital Library
Index Terms
- Repair checking in inconsistent databases: algorithms and complexity
Recommendations
Dichotomies in the Complexity of Preferred Repairs
PODS '15: Proceedings of the 34th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database SystemsThe framework of database repairs provides a principled approach to managing inconsistencies in databases. Informally, a repair of an inconsistence database is a consistent database that differs from the inconsistent one in a "minimal way." A ...
Repair localization for query answering from inconsistent databases
Query answering from inconsistent databases amounts to finding “meaningful” answers to queries posed over database instances that do not satisfy integrity constraints specified over their schema. A declarative approach to this problem relies on the ...
On the tractability and intractability of consistent conjunctive query answering
PhD '11: Proceedings of the 2011 Joint EDBT/ICDT Ph.D. WorkshopThe consistent query answering framework has received considerable attention since it was first introduced as an alternative to coping with inconsistent databases. The framework was defined based on two notions: repairs and consistent query answers. ...
Comments