|
ABSTRACT
We describe an approach for pipelining nested data collections in scientific workflows. Our approach logically delimits arbitrarily nested collections of data tokens using special, paired control tokens inserted into token streams, and provides workflow components with high-level operations for managing these collections. Our framework provides new capabilities for: (1) concurrent operation on collections; (2) on-the-fly customization of workflow component behavior; (3) improved handling of exceptions and faults; and (4) transparent passing of provenance and metadata within token streams. We demonstrate our approach using a workflow for inferring phylogenetic trees. We also describe future extensions to support richer typing mechanisms for facilitating sharing and reuse of workflow components between disciplines. This work represents a step towards our larger goal of exploiting collection-oriented dataflow programming as a new paradigm for scientific workflow systems, an approach we believe will significantly reduce the complexity of creating and reusing workflows and workflow components.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Chad Berkley , Shawn Bowers , Matthew Jones , Bertram Ludäscher , Mark Schildhauer , Jing Tao, Incorporating semantics in scientific workflow authoring, Proceedings of the 17th international conference on Scientific and statistical database management, p.75-78, June 27-29, 2005, Santa Barbara, CA
|
| |
3
|
S. Bowers and B. Ludäscher. Actor-oriented design of scientific workflows. In Proc. of the Intl. Conf. on Conceptual Modeling (ER), 2005.
|
| |
4
|
L. Bright and D. Maier. Deriving and managing data products in an environmental observation and forecasting system. In Conf. on Innovative Data Systems Research (CIDR), 2005.
|
| |
5
|
|
| |
6
|
D. Churches, G. Gombas, A. Harrison, J. Maassen, C. Robinson, M. Shields, I. Taylor, and I. Wang. Programming scientific and distributed workflow with Triana services. Concurrency and Computation: Practice and Experience, Special Issue on Scientific Workflows, 2005.
|
| |
7
|
The office of science data-management challenge. Report from the DOE Office of Science Data-Management Workshops, March-May 2004.
|
| |
8
|
J. Felsenstein. Inferring Phylogenies. Sinauer Associates, Inc., 2004.
|
 |
9
|
|
| |
10
|
P. Gordon. XML for molecular biology. http://www.visualgenomics.ca/gordonp/xml/.
|
| |
11
|
|
| |
12
|
The Kepler Project. http://www.kepler-project.org.
|
| |
13
|
E. A. Lee and S. Neuendorffer. Actor-oriented models for codesign: Balancing re-use and performance. In Formal Methods and Models for Systems. Kluwer, 2004.
|
| |
14
|
E. A. Lee and T. M. Parks. Dataflow process networks. Proc. of the IEEE, 83(5):773--801, 1995.
|
| |
15
|
B. Ludäscher and I. Altintas. On providing declarative design and programming constructs for scientific workflows based on process networks. Technical report, SciDAC-SPA-TN-2003-01, 2003.
|
| |
16
|
B. Ludäscher, I. Altintas, C. Berkley, D. Higgins, E. Jaeger-Frank, M. Jones, E. Lee, J. Tao, and Y. Zhao. Scientific workflow management and the Kepler system. Concurrency and Computation: Practice & Experience, Special Issue on Scientific Workflows, 2005.
|
| |
17
|
D. Maddison, D. Swofford, and W. Maddison. NEXUS: An extensible file format for systematic information. Systematic Biology, 46(4), 1997.
|
| |
18
|
J. Morrison. Flow-Based Programming. Van Nostrand Reinhold, 1994.
|
| |
19
|
Natural Diversity Discovery Project. http://www.nddp.org.
|
| |
20
|
Tom Oinn , Matthew Addis , Justin Ferris , Darren Marvin , Martin Senger , Mark Greenwood , Tim Carver , Kevin Glover , Matthew R. Pocock , Anil Wipat , Peter Li, Taverna: a tool for the composition and enactment of bioinformatics workflows, Bioinformatics, v.20 n.17, p.3045-3054, November 2004
[doi> 10.1093/bioinformatics/bth361]
|
| |
21
|
SciTegic. http://www.scitegic.com/.
|
| |
22
|
D. Weinstein, S. Parker, J. Simpson, K. Zimmerman, and G. Jones. Visualization Handbook, chapter Visualization in the SCIRun Problem Solving Environment. Elsevier, 2005.
|
|