skip to main content
10.1145/2479832.2479848acmconferencesArticle/Chapter ViewAbstractPublication Pagesk-capConference Proceedingsconference-collections
research-article

Detecting common scientific workflow fragments using templates and execution provenance

Published: 23 June 2013 Publication History

Abstract

Provenance plays a major role when understanding and reusing the methods applied in a scientific experiment, as it provides a record of inputs, the processes carried out and the use and generation of intermediate and final results. In the specific case of in-silico scientific experiments, a large variety of scientific workflow systems (e.g., Wings, Taverna, Galaxy, Vistrails) have been created to support scientists. All of these systems produce some sort of provenance about the executions of the workflows that encode scientific experiments. However, provenance is normally recorded at a very low level of detail, which complicates the understanding of what happened during execution. In this paper we propose an approach to automatically obtain abstractions from low-level provenance data by finding common workflow fragments on workflow execution provenance and relating them to templates. We have tested our approach with a dataset of workflows published by the Wings workflow system. Our results show that by using these kinds of abstractions we can highlight the most common abstract methods used in the executions of a repository, relating different runs and workflow templates with each other.

References

[1]
R. Bergmann and Y. Gil. Similarity assessment and efficient retrieval of semantic workflows. To appear in the Information Systems Journal, 2012.
[2]
C. Bizer, T. Heath, and T. Berners-Lee. Linked data - the story so far. International Journal on Semantic Web and Information Systems, 5(3):1--22, 2009.
[3]
S. C. Boulakia, C. Froidevaux, and J. Chen. Scientific workflow rewriting while preserving provenance. In 8th IEEE International Conference on eScience 2012, pages 1--9, Chicago, 2012. IEEE Computer Society Press, USA.
[4]
M. H. Burstein, R. Laddaga, D. D. McDonald, M. T. Cox, B. Benyo, P. Robertson, T. S. Hussain, M. Brinn, and D. V. McDermott. Poirot - integrated learning of web service procedures. In AAAI, pages 1274--1279, 2008.
[5]
S. P. Callahan, J. Freire, E. Santos, C. E. Scheidegger, C. T. Silva, and H. T. Vo. Vistrails: Visualization meets data management. In ACM SIGMOD, pages 745--747. ACM Press, 2006.
[6]
D. J. Cook and L. B. Holder. Substructure discovery using minimum description length and background knowledge. Journal of Artificial Intelligence Research, 1:231--255, 1994.
[7]
S. A. Cook. The complexity of theorem-proving procedures. In Proceedings of the third annual ACM symposium on Theory of computing, STOC '71, pages 151--158, New York, NY, USA, 1971. ACM.
[8]
D. Garijo, P. Alper, K. Belhajjame, O. Corcho, Y. Gil, and C. Goble. Common motifs in scientific workflows: An empirical analysis. In 8th IEEE International Conference on eScience 2012, Chicago, 2012. IEEE Computer Society Press, USA.
[9]
D. Garijo and Y. Gil. A new approach for publishing workflows: Abstractions, standards, and linked data. In Proceedings of the 6th Workshop on Workflows in support of large-scale science, pages 47--56, Seattle, 2011. ACM.
[10]
B. Giardine et al. Galaxy: A platform for interactive large-scale genome analysis. Genome Research, 15(10):1451--1455, Oct 2005.
[11]
Y. Gil, V. Ratnakar, J. Kim, P. A. Gonzälez-Calero, P. T. Groth, J. Moody, and E. Deelman. Wings: Intelligent workflow-based design of computational experiments. IEEE Intelligent Systems, 26(1):62--72, 2011.
[12]
A. Goderis, P. Li, and C. A. Goble. Workflow discovery: the problem, a case study from e-science and a graph-based solution. In ICWS, pages 312--319, 2006.
[13]
A. Goderis, U. Sattler, P. W. Lord, and C. A. Goble. Seven bottlenecks to workflow reuse and repurposing. In International Semantic Web Conference, pages 323--337. Springer, 2005.
[14]
J. M. Gomez-Perez and O. Corcho. Problem-solving methods for understanding process executions. Computing in Science and Engineering, 10(3):47--52, May 2008.
[15]
M. Hauder, Y. Gil, and Y. Liu. A framework for efficient data analytics through automatic configuration and customization of scientific workflows. In Proceedings of the 2011 IEEE Seventh International Conference on eScience, ESCIENCE'11, pages 379--386, Washington, DC, USA, 2011. IEEE Computer Society.
[16]
L. B. Holder, D. J. Cook, and S. Djoko. Substructure Discovery in the SUBDUE System. AAAI Workshop on Knowledge Discovery, pages 169--180, 1994.
[17]
D. Leake and J. Kendall-Morwick. Towards case-based support for e-science workflow generation by mining provenance. In Proceedings of the 9th European conference on Advances in Case-Based Reasoning, ECCBR '08, pages 269--283, Berlin, Heidelberg, 2008. Springer-Verlag.
[18]
B. Ludascher, I. Altintas, C. Berkley, D. Higgins, E. Jaeger, M. Jones, E. A. Lee, J. Tao, and Y. Zhao. Scientific workflow management and the kepler system. Concurrency and Computation: Practice and Experience, 18(10):1039--1065, 2006.
[19]
P. Mates, E. Santos, J. Freire, and C. T. Silva. Crowdlabs: Social analysis and visualization for the sciences. In 23rd International Conference on Scientific and Statistical Database Management (SSDBM), pages 555--564. Springer, 2011.
[20]
P. Missier, S. Soiland-Reyes, S. Owen, W. Tan, A. Nenadic, I. Dunlop, A. Williams, T. Oinn, and C. Goble. Taverna, reloaded. In 22nd International Conference on Scientific and Statistical Database Management (SSDBM), Heidelberg, Germany, 2010.
[21]
L. Moreau, B. Clifford, J. Freire, J. Futrelle, Y. Gil, P. Groth, N. Kwasnikowska, S. Miles, P. Missier, J. Myers, B. Plale, Y. Simmhan, E. Stephan, and J. Van den Bussche. The Open Provenance Model core specification (v1.1). Future Generation Computer Systems, July 2010.
[22]
A. G. Perez and R. Benjamins. Applications of ontologies and problem-solving methods. AI Magazine, 20(1), 1999.
[23]
M. Reich, T. Liefeld, J. Gould, J. Lerner, P. Tamayo, and J. P. Mesirov. Genepattern 2.0. Nature Genetics, 38:500--501, 2006.
[24]
D. D. Roure, C. A. Goble, and R. Stevens. The design and realisation of the myExperiment virtual research environment for social sharing of workflows. Future Generation Comp. Syst., 25(5):561--567, 2009.
[25]
W. M. P. van der Aalst, A. H. M. ter Hofstede, B. Kiepuszewski, and A. P. Barros. Workflow patterns. Distributed and Parallel Databases, 14(1):5--51, 2003.
[26]
F. Yaman, T. Oates, and M. Burstein. A context driven approach for workflow mining. In Proceedings of the 21st international jont conference on Artifical intelligence, IJCAI'09, pages 1798--1803, San Francisco, CA, USA, 2009. Morgan Kaufmann Publishers Inc.

Cited By

View all
  • (2023)Workflow analysis of data science code in public GitHub repositoriesEmpirical Software Engineering10.1007/s10664-022-10229-z28:1Online publication date: 1-Jan-2023
  • (2020)Integrating Quantum Computing into Workflow Modeling and Execution2020 IEEE/ACM 13th International Conference on Utility and Cloud Computing (UCC)10.1109/UCC48980.2020.00046(279-291)Online publication date: Dec-2020
  • (2020)Topic-based crossing-workflow fragment discoveryFuture Generation Computer Systems10.1016/j.future.2020.05.029112(1141-1155)Online publication date: Nov-2020
  • Show More Cited By

Index Terms

  1. Detecting common scientific workflow fragments using templates and execution provenance

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    K-CAP '13: Proceedings of the seventh international conference on Knowledge capture
    June 2013
    160 pages
    ISBN:9781450321020
    DOI:10.1145/2479832
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 June 2013

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. abstraction
    2. provenance
    3. scientific workflow
    4. wings

    Qualifiers

    • Research-article

    Conference

    K-CAP 2013
    Sponsor:
    K-CAP 2013: Knowledge Capture Conference
    June 23 - 26, 2013
    Banff, Canada

    Acceptance Rates

    K-CAP '13 Paper Acceptance Rate 13 of 60 submissions, 22%;
    Overall Acceptance Rate 55 of 198 submissions, 28%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 18 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Workflow analysis of data science code in public GitHub repositoriesEmpirical Software Engineering10.1007/s10664-022-10229-z28:1Online publication date: 1-Jan-2023
    • (2020)Integrating Quantum Computing into Workflow Modeling and Execution2020 IEEE/ACM 13th International Conference on Utility and Cloud Computing (UCC)10.1109/UCC48980.2020.00046(279-291)Online publication date: Dec-2020
    • (2020)Topic-based crossing-workflow fragment discoveryFuture Generation Computer Systems10.1016/j.future.2020.05.029112(1141-1155)Online publication date: Nov-2020
    • (2020)Systematizing scientific laboratory work by a workflow and template for electronic laboratory notebooksEducation for Chemical Engineers10.1016/j.ece.2020.03.00431(42-53)Online publication date: Apr-2020
    • (2019)Does Diversity Affect User Satisfaction in Image SearchACM Transactions on Information Systems10.1145/332011837:3(1-30)Online publication date: 8-May-2019
    • (2019)trackr: A Framework for Enhancing Discoverability and Reproducibility of Data Visualizations and Other Artifacts in RJournal of Computational and Graphical Statistics10.1080/10618600.2019.1585259(1-15)Online publication date: 18-Mar-2019
    • (2018)Cross-domain similarity assessment for workflow improvement to handle Big Data challenge in workflow managementJournal of Big Data10.1186/s40537-018-0135-65:1Online publication date: 23-Jul-2018
    • (2018)Cross-domain graph based similarity measurement of workflowsJournal of Big Data10.1186/s40537-018-0127-65:1Online publication date: 24-May-2018
    • (2018)A survey of simulation provenance systemsHuman-centric Computing and Information Sciences10.1186/s13673-018-0150-98:1(1-29)Online publication date: 1-Dec-2018
    • (2018)Reproducibility in Scientific ComputingACM Computing Surveys10.1145/318626651:3(1-36)Online publication date: 16-Jul-2018
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media