ABSTRACT
Provenance information is vital in many application areas as it helps explain data lineage and derivation. However, storing fine-grained provenance information can be expensive. In this paper, we present a framework for storing provenance information relating to data derived via database queries. In particular, we first propose a provenance tree data structure which matches the query structure and thereby presents a possibility to avoid redundant storage of information regarding the derivation process. Then we investigate two approaches for reducing storage costs. The first approach utilizes two ingenious rules to achieve reduction on provenance trees. The second one is a dynamic programming solution, which provides a way of optimizing the selection of query tree nodes where provenance information should be stored. The optimization algorithm runs in polynomial time in the query size and is linear in the size of the provenance information, thus enabling provenance tracking and optimization without incurring large overheads. Experiments show that our approaches guarantee significantly lower storage costs than existing approaches.
- Y. Amsterdamer, D. Deutch, T. Milo, and V. Tannen. On provenance minimization. In PODS, pages 141--152, 2011. Google ScholarDigital Library
- O. Benjelloun, A. Sarma, A. Halevy, and J. Widom. Uldbs: Databases with uncertainty and lineage. In VLDB, 2006. Google ScholarDigital Library
- D. Bhagwat, L. Chiticariu, W. C. Tan, and G. Vijayvargiya. An annotation management system for relational databases. In VLDB, 2004. Google ScholarDigital Library
- R. Bose and J. Frew. Lineage retrieval for scientific data processing: a survey. ACM Computing Surveys, 37, 2005. Google ScholarDigital Library
- P. Buneman, A. Chapman, and J. Cheney. Provenance management in curated databases. In SIGMOD, 2006. Google ScholarDigital Library
- P. Buneman, S. Khanna, and W. C. Tan. Why and where: A characterization of data provenance. In ICDT, 2001. Google ScholarDigital Library
- A. Chapman, H. V. Jagadish, and P. Ramanan. Efficient provenance storage. In SIGMOD, 2008. Google ScholarDigital Library
- J. Cheney, L. Chiticariu, and W. C. Tan. Provenance in databases: Why, how, and where. Foundations and Trends in Databases, 1(4):379--474, 2009. Google ScholarDigital Library
- A. Church and J. B. Rosser. Some properties of conversion. Transactions of the American Mathematical Society,39(3), 1936.Google Scholar
- Y. Cui and J. Widom. Practical lineage tracing in data warehouses. In ICDE, 2000.Google ScholarCross Ref
- Y. Cui and J. Widom. Lineage tracing for general data warehouse transformations. VLDB J., 12(1), 2003. Google ScholarDigital Library
- B. Glavic and G. Alonso. Perm: Processing provenance and data on the same data model through query rewriting. In ICDE, 2009. Google ScholarDigital Library
- T. J. Green, G. Karvounarakis, Z. G. Ives, and V. Tannen. Update exchange with mappings and provenance. In VLDB, 2007. Google ScholarDigital Library
- T. J. Green, G. Karvounarakis, and V. Tannen. Provenance semirings. In PODS, 2007. Google ScholarDigital Library
- G. Karvounarakis, Z. G. Ives, and V. Tannen. Querying data provenance. In SIGMOD, 2010. Google ScholarDigital Library
- D. T. Liu and M. J. Franklin. The design of griddb: A data-centric overlay for the scientific grid. In VLDB, 2004. Google ScholarDigital Library
- D. Olteanu and J. Zavodny. Factorised representations of query results: Size bounds and readability. In ICDT, 2012. Google ScholarDigital Library
- D. Srivastava and Y. Velegrakis. Intensional associations between data and metadata. In SIGMOD, 2007. Google ScholarDigital Library
- http://www.comp.nus.edu.sg/ baozhife/ptree.Google Scholar
- J. Widom. Trio: A system for integrated management of data, accuracy, and lineage. In CIDR, 2005.Google Scholar
- A. Woodruff and M. Stonebraker. Supporting fine-grained data lineage in a database visualization environment. In ICDE, 1997. Google ScholarDigital Library
Index Terms
- Efficient provenance storage for relational queries
Recommendations
Efficient provenance storage
SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of dataAs the world is increasingly networked and digitized, the data we store has more and more frequently been chopped, baked, diced and stewed. In consequence, there is an increasing need to store and manage provenance for each data item stored in a ...
Data Provenance for Historical Queries in Relational Database
Compute '15: Proceedings of the 8th Annual ACM India ConferenceCapturing, modeling, and querying data provenance in databases has gained considerable importance in the last decade. All kinds of applications developed on top of databases, now a days collect provenance for various purposes like trustworthiness of ...
Provenance Information Model of Karma Version 3
SERVICES '09: Proceedings of the 2009 Congress on Services - IProvenance that captures e-Science activity has long term value only if the right amount and kind of information is collected. In this paper, we propose a two-layer model for representing provenance information capable of representing both execution ...
Comments