ABSTRACT
In this paper we propose a new provenance model which is tailored to a class of workflow-based applications. We motivate the approach with use cases from the astronomy community. We generalize the class of applications the approach is relevant to and propose a pipeline-centric provenance model. Finally, we evaluate the benefits in terms of storage needed by the approach when applied to an astronomy application.
- L. Moreau, P. Groth, S. Miles, J. Vazquez, J. Ibbotson, S. Jiang, S. Munroe, O. Rana, A. Schreiber, V. Tan, and L. Varga, "The Provenance of Electronic Data," Communications of the ACM, 2008. Google ScholarDigital Library
- Y. Gil, E. Deelman, M. Ellisman, T. Fahringer, G. Fox, D. Gannon, C. Goble, M. Livny, L. Moreau, and J. Myers, "Examining the Challenges of Scientific Workflows," IEEE Computer, vol. 40, pp. 24--32, 2007. Google ScholarDigital Library
- R. Bose and J. Frew, "Lineage retrieval for scientific data processing: a survey," ACM Computing Surveys, vol. 37, pp. 1--28, 2005. Google ScholarDigital Library
- Y. L. Simmhan, B. Plale, and D. Gannon, "A survey of data provenance in e-science," SIGMOD Record, vol. 34, pp. 31--36, 2005. Google ScholarDigital Library
- Workflows in e-Science. I. Taylor, E. Deelman, D. Gannon, and M. Shields, Eds.: Springer, 2006.Google Scholar
- E. Deelman, D. Gannon, M. Shields, and I. Taylor, "Workflows and e-Science: An overview of workflow system features and capabilities," Future Generation Computer Systems, p. doi:10.1016/j.future.2008.06.012, 2008. Google ScholarDigital Library
- G. B. Berriman, E. Deelman, J. Good, J. Jacob, D. S. Katz, C. Kesselman, A. Laity, T. A. Prince, G. Singh, and M.-H. Su, "Montage: A Grid Enabled Engine for Delivering Custom Science-Grade Mosaics On Demand," in SPIE Conference 5487: Astronomical Telescopes, 2004.Google Scholar
- "Montage." http://montage.ipac.caltech.eduGoogle Scholar
- B. Berriman, A. Bergou, E. Deelman, J. Good, J. Jacob, D. Katz, C. Kesselman, A. Laity, G. Singh, M.-H. Su, and R. Williams, "Montage: A Grid-Enabled Image Mosaic Service for the NVO," in Astronomical Data Analysis Software&Systems (ADASS) XIII, 2003.Google Scholar
- Z. Ivezic, J. Tyson, R. Allsman, J. Andrew, R. Angel, T. Axelrod, J. Barr, A. Becker, J. Becla, and C. Beldica, "LSST: from science drivers to reference design and anticipated data products," 2008.Google Scholar
- "Flexible Image Transport System." http://fits.gsfc.nasa.gov/Google Scholar
- M. F. Skrutskie, S. E. Schneider, R. Stiening, S. E. Strom, M. D. Weinberg, C. Beichman, T. Chester, R. Cutri, C. Lonsdale, and J. Elias, "The Two Micron All Sky Survey (2MASS): Overview and Status," In The Impact of Large Scale Near-IR Sky Surveys, eds. F. Garzon et al., p. 25. Dordrecht: Kluwer Academic Publishing Company, 1997., 1997.Google Scholar
- E. Deelman, K. Blackburn, P. Ehrens, C. Kesselman, S. Koranda, A. Lazzarini, G. Mehta, L. Meshkat, L. Pearlman, K. Blackburn, and R. Williams., "GriPhyN and LIGO, Building a Virtual Data Grid for Gravitational Wave Scientists," in 11th Intl Symposium on High Performance Distributed Computing, 2002. Google ScholarDigital Library
- E. Deelman, I. Foster, C. Kesselman, and M. Livny, "Representing Virtual Data: A Catalog Architecture for Location and Materialization Transparency," Technical Report GriPhyN-2001-14, 2001.Google Scholar
- E. Deelman, G. Singh, M.-H. Su, J. Blythe, Y. Gil, C. Kesselman, G. Mehta, K. Vahi, G. B. Berriman, J. Good, A. Laity, J. C. Jacob, and D. S. Katz, "Pegasus: a Framework for Mapping Complex Scientific Workflows onto Distributed Systems," Scientific Programming Journal, vol. 13, pp. 219--237, 2005. Google ScholarDigital Library
- C. A. Goble and D. C. De Roure, "myExperiment: social networking for workflow-using e-scientists," Proceedings of the 2nd workshop on Workflows in support of large-scale science, pp. 1--2, 2007. Google ScholarDigital Library
- N. Podhorszki, B. Ludaescher, I. Altintas, S. Bowers, and T. McPhillips, "Recording Data Provenance for Kepler Scientific Workflows," Concurrency and Computation: Practice and Experience, 2007.Google Scholar
- Globus, "www.globus.org," 2006.Google Scholar
- "Condor." http://www.cs.wisc.edu/condorGoogle Scholar
- E. Deelman, G. Mehta, G. Singh, M.-H. Su, and K. Vahi, "Pegasus: Mapping Large-Scale Workflows to Distributed Resources," in Workflows in e-Science, I. Taylor, E. Deelman, D. Gannon, and M. Shields, Eds.: Springer, 2006.Google Scholar
- "Pegasus." http://pegasus.isi.eduGoogle Scholar
- R. Boisvert, S. Browne, J. Dongarra, and E. Grosse, "Digital Software and Data Repositories for Support of Scientific Computing," in Advances in Digital Libraries: Springer-Verlag, NY, 1996 Google ScholarDigital Library
- "Amazon Elastic Compute Cloud." http://aws.amazon.com/ec2/Google Scholar
- K.-K. Muniswamy-Reddy, D. A. Holland, U. Braun, and M. Seltzer, "Provenance-Aware Storage Systems," in USENIX Annual Technical Conference, Boston, MA, 2006. Google ScholarDigital Library
- T. Oinn, P. Li, D. B. Kell, C. Goble, A. Goderis, M. Greenwood, D. Hull, R. Stevens, D. Turi, and J. Zhao, "Taverna/myGrid: Aligning a Workflow System with the Life Sciences Community," in Workflows in e-Science, I. Taylor, E. Deelman, D. Gannon, and M. Shields, Eds.: Springer, 2006.Google Scholar
- J. Zhao, C. Goble, R. Stevens, and D. Turi, "Mining Taverna's semantic web of provenance," Concurrency and Computation: Practice and Experience, vol. 20, pp. 463--472, 2008. Google ScholarDigital Library
- P. Agrawal, O. Benjelloun, A. Sarma, C. Hayworth, S. Nabar, T. Sugihara, and J. Widom, "Trio: A system for data, uncertainty, and lineage," in 32nd international conference on Very large data 2006, pp. 1151--1154. Google ScholarDigital Library
- B. Clifford, I. Foster, J. Voeckler, M. Wilde, and Y. Zhao, "Tracking provenance in a virtual data grid," CONCURRENCY AND COMPUTATION, vol. 20, p. 565, 2008. Google ScholarDigital Library
- A. Chapman, H. Jagadish, and P. Ramanan, "Efficient provenance storage," in 2008 ACM SIGMOD international Conference on Management of Data, 2008, pp. 993--1006. Google ScholarDigital Library
- T. Heinis and G. Alonso, "Efficient lineage tracking for scientific workflows," in 2008 ACM SIGMOD international Conference on Management of Data, 2008, pp. 1007--1018. Google ScholarDigital Library
- P. Groth, S. Miles, and L. Moreau, "A model of process documentation to determine provenance in mash-ups," ACM Trans. Internet Technologies, 2009. Google ScholarDigital Library
- B. Levine and M. Liberatore, "DEX: Digital Evidence Provenance Supporting Reproducibility and Comparison," in DFRWS Annual Conference, 2009.Google Scholar
- L. Moreau, J. Freire, J. Futrelle, R. E. McGrath, J. Myers, and P. Paulson, "The Open Provenance Model," University of Southampton2007.Google Scholar
Recommendations
Semantic Provenance for eScience: Managing the Deluge of Scientific Data
Provenance information in eScience is metadata that's critical to effectively manage the exponentially increasing volumes of scientific data from industrial-scale experiment protocols. Semantic provenance, based on domain-specific provenance ontologies, ...
Provenance Support for Grid-Enabled Scientific Workflows
SKG '08: Proceedings of the 2008 Fourth International Conference on Semantics, Knowledge and GridThe Grid is evolving and new concepts like Semantic Grid, Knowledge Grid are rapidly emerging, where humans and distributed machines share, exchange, and manage data and resources intelligently. Computational scientists typically use workflows to ...
Provenance management in Swift
The Swift parallel scripting language allows for the specification, execution and analysis of large-scale computations in parallel and distributed environments. It incorporates a data model for recording and querying provenance information. In this ...
Comments