skip to main content
10.1145/2396761.2398439acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Efficient provenance storage for relational queries

Published:29 October 2012Publication History

ABSTRACT

Provenance information is vital in many application areas as it helps explain data lineage and derivation. However, storing fine-grained provenance information can be expensive. In this paper, we present a framework for storing provenance information relating to data derived via database queries. In particular, we first propose a provenance tree data structure which matches the query structure and thereby presents a possibility to avoid redundant storage of information regarding the derivation process. Then we investigate two approaches for reducing storage costs. The first approach utilizes two ingenious rules to achieve reduction on provenance trees. The second one is a dynamic programming solution, which provides a way of optimizing the selection of query tree nodes where provenance information should be stored. The optimization algorithm runs in polynomial time in the query size and is linear in the size of the provenance information, thus enabling provenance tracking and optimization without incurring large overheads. Experiments show that our approaches guarantee significantly lower storage costs than existing approaches.

References

  1. Y. Amsterdamer, D. Deutch, T. Milo, and V. Tannen. On provenance minimization. In PODS, pages 141--152, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. O. Benjelloun, A. Sarma, A. Halevy, and J. Widom. Uldbs: Databases with uncertainty and lineage. In VLDB, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. Bhagwat, L. Chiticariu, W. C. Tan, and G. Vijayvargiya. An annotation management system for relational databases. In VLDB, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Bose and J. Frew. Lineage retrieval for scientific data processing: a survey. ACM Computing Surveys, 37, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. Buneman, A. Chapman, and J. Cheney. Provenance management in curated databases. In SIGMOD, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. Buneman, S. Khanna, and W. C. Tan. Why and where: A characterization of data provenance. In ICDT, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Chapman, H. V. Jagadish, and P. Ramanan. Efficient provenance storage. In SIGMOD, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Cheney, L. Chiticariu, and W. C. Tan. Provenance in databases: Why, how, and where. Foundations and Trends in Databases, 1(4):379--474, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Church and J. B. Rosser. Some properties of conversion. Transactions of the American Mathematical Society,39(3), 1936.Google ScholarGoogle Scholar
  10. Y. Cui and J. Widom. Practical lineage tracing in data warehouses. In ICDE, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  11. Y. Cui and J. Widom. Lineage tracing for general data warehouse transformations. VLDB J., 12(1), 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. B. Glavic and G. Alonso. Perm: Processing provenance and data on the same data model through query rewriting. In ICDE, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. J. Green, G. Karvounarakis, Z. G. Ives, and V. Tannen. Update exchange with mappings and provenance. In VLDB, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. T. J. Green, G. Karvounarakis, and V. Tannen. Provenance semirings. In PODS, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. G. Karvounarakis, Z. G. Ives, and V. Tannen. Querying data provenance. In SIGMOD, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. T. Liu and M. J. Franklin. The design of griddb: A data-centric overlay for the scientific grid. In VLDB, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. Olteanu and J. Zavodny. Factorised representations of query results: Size bounds and readability. In ICDT, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. Srivastava and Y. Velegrakis. Intensional associations between data and metadata. In SIGMOD, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. http://www.comp.nus.edu.sg/ baozhife/ptree.Google ScholarGoogle Scholar
  20. J. Widom. Trio: A system for integrated management of data, accuracy, and lineage. In CIDR, 2005.Google ScholarGoogle Scholar
  21. A. Woodruff and M. Stonebraker. Supporting fine-grained data lineage in a database visualization environment. In ICDE, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Efficient provenance storage for relational queries

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management
      October 2012
      2840 pages
      ISBN:9781450311564
      DOI:10.1145/2396761

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 29 October 2012

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader