skip to main content
10.1145/1514894.1514899acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article
Free Access

Repair checking in inconsistent databases: algorithms and complexity

Published:23 March 2009Publication History

ABSTRACT

Managing inconsistency in databases has long been recognized as an important problem. One of the most promising approaches to coping with inconsistency in databases is the framework of database repairs, which has been the topic of an extensive investigation over the past several years. Intuitively, a repair of an inconsistent database is a consistent database that differs from the given inconsistent database in a minimal way. So far, most of the work in this area has addressed the problem of obtaining the consistent answers to a query posed on an inconsistent database. Repair checking is the following decision problem: given two databases r and r', is r' a repair of r? Although repair checking is a fundamental algorithmic problem about inconsistent databases, it has not received as much attention as consistent query answering. In this paper, we give a polynomial-time algorithm for subset-repair checking under integrity constraints that are the union of a weakly acyclic set of local-as-view (LAV) tuple-generating dependencies and a set of equality-generating dependencies. This result significantly generalizes earlier work for subset-repair checking when the integrity constraints are the union of an acyclic set of inclusion dependencies and a set of functional dependencies. We also give a polynomial-time algorithm for symmetric-difference repair checking, when the integrity constraints form a weakly acyclic set of LAV tgds. After this, we establish a number of complexity-theoretic results that delineate the boundary between tractability and intractability for the repair-checking problem. Specifically, we show that the aforementioned tractability results are optimal; in particular, subset-repair checking for arbitrary weakly acyclic sets of tuple-generating dependencies is a coNP-complete problem. We also study cardinality-based repairs and show that cardinality-repair checking is coNP-complete for various classes of integrity constraints encountered in database design and data exchange.

References

  1. M. Arenas, L. Bertossi, and J. Chomicki. Consistent query answers in inconsistent databases. In 18th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS'99), pages 68--79, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. O. Benjelloun, H. Garcia-Molina, H. Gong, H. Kawai, T. E. Larson, D. Menestrina, and S. Thavisomboon. D-swoosh: A family of algorithms for generic, distributed entity resolution. In ICDCS, page 37, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. L. E. Bertossi. Consistent query answering in databases. SIGMOD Record, 35(2):68--76, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. L. E. Bertossi, L. Bravo, E. Franconi, and A. Lopatenko. Data cleansing for numerical data sets. In SEBD, pages 292--299, 2005.Google ScholarGoogle Scholar
  5. L. E. Bertossi, L. Bravo, E. Franconi, and A. Lopatenko. Fixing inconsistent databases by updating numerical attributes. In DEXA Workshops, pages 854--858, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  6. L. E. Bertossi, L. Bravo, E. Franconi, and A. Lopatenko. Fixing inconsistent databases by updating numerical attributes. In DEXA Workshops, pages 854--858, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  7. P. Bohannon, W. Fan, F. Geerts, X. Jia, and A. Kementsietsidis. Conditional functional dependencies for data cleaning. In ICDE, pages 746--755, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  8. P. Bohannon, M. Flaster, W. Fan, and R. Rastogi. A cost-based model and effective heuristic for repairing constraints by value modification. In SIGMOD Conference, pages 143--154, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Chomicki. Consistent query answering: Five easy pieces. In ICDT, pages 1--17, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Chomicki and J. Marcinkowski. Minimal-change integrity maintenance using tuple deletions. Inf. Comput., 197(1/2):90--121, 2005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Deutsch and V. Tannen. Reformulation of XML Queries and Constraints. In ICDT, pages 225--241, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. W. Eckerson. Data quality and the bottom line: Achieving business success through a commitment to high quality data. Technical report, The Data Warehousing Institute, 2002. http://www.tdwi.org/research/display.aspx?ID=6064.Google ScholarGoogle Scholar
  13. R. Fagin, P. G. Kolaitis, R. J. Miller, and L. Popa. Data exchange: semantics and query answering. Theor. Comput. Sci., 336(1):89--124, 2005. Preliminary version in ICDT 2003. Google ScholarGoogle ScholarCross RefCross Ref
  14. E. Franconi, A. L. Palma, N. Leone, S. Perri, and F. Scarcello. Census data repair: a challenging application of disjunctive logic programming. In LPAR, pages 561--578, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Fuxman, P. G. Kolaitis, R. J. Miller, and W. C. Tan. Peer data exchange. ACM Trans. Database Syst., 31(4):1454--1498, 2006. Preliminary version in PODS 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. H. Galhardas, D. Florescu, D. Shasha, E. Simon, and C.-A. Saita. Improving data cleaning quality using a data lineage facility. In DMDW, page 3, 2001.Google ScholarGoogle Scholar
  17. G. Gottlob and A. Nash. Data exchange: computing cores in polynomial time. In PODS, pages 40--49, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. Greenlaw, H. J. Hoover, and W. L. Ruzzo. Limits to Parallel Computation: P-Completeness Theory. Oxford University Press, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. A. Hernández and S. J. Stolfo. Real-world data is dirty: Data cleansing and the merge/purge problem. Data Min. Knowl. Discov., 2(1):9--37, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. P. G. Kolaitis. Schema mappings, data exchange, and metadata management. In PODS, pages 61--75, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Lenzerini. Data Integration: A Theoretical Perspective. pages 233--246, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Lopatenko and L. E. Bertossi. Complexity of consistent query answering in databases under cardinality-based and incremental repair semantics. In ICDT, pages 179--193, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. D. Menestrina, O. Benjelloun, and H. Garcia-Molina. Generic entity resolution with data confidences. In CleanDB, 2006.Google ScholarGoogle Scholar
  24. C. Moore and J. M. Robson. Hard tiling problems with simple tiles. Discrete and Computational Geometry, 26(4):573--590, 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. E. Rahm and H. Do. Data cleaning: Problems and current approaches. IEEE Data Engineering Bulletin, 23(4), 2000.Google ScholarGoogle Scholar
  26. V. Raman and J. M. Hellerstein. Potter's wheel: An interactive data cleaning system. In VLDB, pages 381--390, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. Staworko. Declarative Inconsistency Handling in Relational and Semi-structured databases. PhD thesis, May 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. Wijsen. Database repairing using updates. ACM Trans. Database Syst., 30(3):722--768, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Repair checking in inconsistent databases: algorithms and complexity

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader