skip to main content
10.1145/1645164.1645168acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Pipeline-centric provenance model

Authors Info & Claims
Published:16 November 2009Publication History

ABSTRACT

In this paper we propose a new provenance model which is tailored to a class of workflow-based applications. We motivate the approach with use cases from the astronomy community. We generalize the class of applications the approach is relevant to and propose a pipeline-centric provenance model. Finally, we evaluate the benefits in terms of storage needed by the approach when applied to an astronomy application.

References

  1. L. Moreau, P. Groth, S. Miles, J. Vazquez, J. Ibbotson, S. Jiang, S. Munroe, O. Rana, A. Schreiber, V. Tan, and L. Varga, "The Provenance of Electronic Data," Communications of the ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Y. Gil, E. Deelman, M. Ellisman, T. Fahringer, G. Fox, D. Gannon, C. Goble, M. Livny, L. Moreau, and J. Myers, "Examining the Challenges of Scientific Workflows," IEEE Computer, vol. 40, pp. 24--32, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Bose and J. Frew, "Lineage retrieval for scientific data processing: a survey," ACM Computing Surveys, vol. 37, pp. 1--28, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Y. L. Simmhan, B. Plale, and D. Gannon, "A survey of data provenance in e-science," SIGMOD Record, vol. 34, pp. 31--36, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Workflows in e-Science. I. Taylor, E. Deelman, D. Gannon, and M. Shields, Eds.: Springer, 2006.Google ScholarGoogle Scholar
  6. E. Deelman, D. Gannon, M. Shields, and I. Taylor, "Workflows and e-Science: An overview of workflow system features and capabilities," Future Generation Computer Systems, p. doi:10.1016/j.future.2008.06.012, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. B. Berriman, E. Deelman, J. Good, J. Jacob, D. S. Katz, C. Kesselman, A. Laity, T. A. Prince, G. Singh, and M.-H. Su, "Montage: A Grid Enabled Engine for Delivering Custom Science-Grade Mosaics On Demand," in SPIE Conference 5487: Astronomical Telescopes, 2004.Google ScholarGoogle Scholar
  8. "Montage." http://montage.ipac.caltech.eduGoogle ScholarGoogle Scholar
  9. B. Berriman, A. Bergou, E. Deelman, J. Good, J. Jacob, D. Katz, C. Kesselman, A. Laity, G. Singh, M.-H. Su, and R. Williams, "Montage: A Grid-Enabled Image Mosaic Service for the NVO," in Astronomical Data Analysis Software&Systems (ADASS) XIII, 2003.Google ScholarGoogle Scholar
  10. Z. Ivezic, J. Tyson, R. Allsman, J. Andrew, R. Angel, T. Axelrod, J. Barr, A. Becker, J. Becla, and C. Beldica, "LSST: from science drivers to reference design and anticipated data products," 2008.Google ScholarGoogle Scholar
  11. "Flexible Image Transport System." http://fits.gsfc.nasa.gov/Google ScholarGoogle Scholar
  12. M. F. Skrutskie, S. E. Schneider, R. Stiening, S. E. Strom, M. D. Weinberg, C. Beichman, T. Chester, R. Cutri, C. Lonsdale, and J. Elias, "The Two Micron All Sky Survey (2MASS): Overview and Status," In The Impact of Large Scale Near-IR Sky Surveys, eds. F. Garzon et al., p. 25. Dordrecht: Kluwer Academic Publishing Company, 1997., 1997.Google ScholarGoogle Scholar
  13. E. Deelman, K. Blackburn, P. Ehrens, C. Kesselman, S. Koranda, A. Lazzarini, G. Mehta, L. Meshkat, L. Pearlman, K. Blackburn, and R. Williams., "GriPhyN and LIGO, Building a Virtual Data Grid for Gravitational Wave Scientists," in 11th Intl Symposium on High Performance Distributed Computing, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. E. Deelman, I. Foster, C. Kesselman, and M. Livny, "Representing Virtual Data: A Catalog Architecture for Location and Materialization Transparency," Technical Report GriPhyN-2001-14, 2001.Google ScholarGoogle Scholar
  15. E. Deelman, G. Singh, M.-H. Su, J. Blythe, Y. Gil, C. Kesselman, G. Mehta, K. Vahi, G. B. Berriman, J. Good, A. Laity, J. C. Jacob, and D. S. Katz, "Pegasus: a Framework for Mapping Complex Scientific Workflows onto Distributed Systems," Scientific Programming Journal, vol. 13, pp. 219--237, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. A. Goble and D. C. De Roure, "myExperiment: social networking for workflow-using e-scientists," Proceedings of the 2nd workshop on Workflows in support of large-scale science, pp. 1--2, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. N. Podhorszki, B. Ludaescher, I. Altintas, S. Bowers, and T. McPhillips, "Recording Data Provenance for Kepler Scientific Workflows," Concurrency and Computation: Practice and Experience, 2007.Google ScholarGoogle Scholar
  18. Globus, "www.globus.org," 2006.Google ScholarGoogle Scholar
  19. "Condor." http://www.cs.wisc.edu/condorGoogle ScholarGoogle Scholar
  20. E. Deelman, G. Mehta, G. Singh, M.-H. Su, and K. Vahi, "Pegasus: Mapping Large-Scale Workflows to Distributed Resources," in Workflows in e-Science, I. Taylor, E. Deelman, D. Gannon, and M. Shields, Eds.: Springer, 2006.Google ScholarGoogle Scholar
  21. "Pegasus." http://pegasus.isi.eduGoogle ScholarGoogle Scholar
  22. R. Boisvert, S. Browne, J. Dongarra, and E. Grosse, "Digital Software and Data Repositories for Support of Scientific Computing," in Advances in Digital Libraries: Springer-Verlag, NY, 1996 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. "Amazon Elastic Compute Cloud." http://aws.amazon.com/ec2/Google ScholarGoogle Scholar
  24. K.-K. Muniswamy-Reddy, D. A. Holland, U. Braun, and M. Seltzer, "Provenance-Aware Storage Systems," in USENIX Annual Technical Conference, Boston, MA, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. T. Oinn, P. Li, D. B. Kell, C. Goble, A. Goderis, M. Greenwood, D. Hull, R. Stevens, D. Turi, and J. Zhao, "Taverna/myGrid: Aligning a Workflow System with the Life Sciences Community," in Workflows in e-Science, I. Taylor, E. Deelman, D. Gannon, and M. Shields, Eds.: Springer, 2006.Google ScholarGoogle Scholar
  26. J. Zhao, C. Goble, R. Stevens, and D. Turi, "Mining Taverna's semantic web of provenance," Concurrency and Computation: Practice and Experience, vol. 20, pp. 463--472, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. P. Agrawal, O. Benjelloun, A. Sarma, C. Hayworth, S. Nabar, T. Sugihara, and J. Widom, "Trio: A system for data, uncertainty, and lineage," in 32nd international conference on Very large data 2006, pp. 1151--1154. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. B. Clifford, I. Foster, J. Voeckler, M. Wilde, and Y. Zhao, "Tracking provenance in a virtual data grid," CONCURRENCY AND COMPUTATION, vol. 20, p. 565, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. A. Chapman, H. Jagadish, and P. Ramanan, "Efficient provenance storage," in 2008 ACM SIGMOD international Conference on Management of Data, 2008, pp. 993--1006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. T. Heinis and G. Alonso, "Efficient lineage tracking for scientific workflows," in 2008 ACM SIGMOD international Conference on Management of Data, 2008, pp. 1007--1018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. P. Groth, S. Miles, and L. Moreau, "A model of process documentation to determine provenance in mash-ups," ACM Trans. Internet Technologies, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. B. Levine and M. Liberatore, "DEX: Digital Evidence Provenance Supporting Reproducibility and Comparison," in DFRWS Annual Conference, 2009.Google ScholarGoogle Scholar
  33. L. Moreau, J. Freire, J. Futrelle, R. E. McGrath, J. Myers, and P. Paulson, "The Open Provenance Model," University of Southampton2007.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    WORKS '09: Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science
    November 2009
    136 pages
    ISBN:9781605587172
    DOI:10.1145/1645164

    Copyright © 2009 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 16 November 2009

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    Overall Acceptance Rate30of54submissions,56%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader