skip to main content
research-article

Generating sound workflow views for correct provenance analysis

Published: 18 March 2011 Publication History

Abstract

Workflow views abstract groups of tasks in a workflow into high level composite tasks, in order to reuse subworkflows and facilitate provenance analysis. However, unless a view is carefully designed, it may not preserve the dataflow between tasks in the workflow, that is, it may not be sound. Unsound views can be misleading and cause incorrect provenance analysis.
This article studies the problem of efficiently identifying and correcting unsound workflow views with minimal changes, and constructing minimal sound and elucidative workflow views with a set of user-specified relevant tasks. In particular, two related problems are investigated. First, given a workflow view, we wish to split each unsound composite task into the minimal number of tasks, such that the resulting view is sound. Second, given a workflow and a set of user specified relevant tasks, we generate a sound view, such that each composite task contains at most one relevant task, and the total number of tasks is minimized. We prove that both problems are NP-hard by reduction from independent set. We then propose two local optimality conditions (weak and strong) for each problem, and design polynomial time algorithms for both problems to meet these conditions. Experiments show that our proposed algorithms are reasonably effective and efficient. The proposed techniques are useful for view analysis/construction for not only workflows, but general networks as well.

References

[1]
Abadi, M., Banerjee, B., Heintze, N., and Riecke, J. G. 1999. A core calculus of dependency. In Proceedings of the ACM Symposium on Principles of Programming Languages. 147--160.
[2]
Altintas, I., Berkley, C., Jaeger, E., Jones, M., Ludäscher, B., and Mock, S. 2004. Kepler: An extensible system for design and execution of scientific workflows. In Proceedings of the International Conference on Statistical and Scientific Database Management. 423--424.
[3]
Barkaoui, K., Ayed, R. B., and Shai, Z. 2007. Workflow soundness verification based on structure theory of Petri nets. Int. J. Comput. Inform. Sci. 5, 1, 51--6I.
[4]
Biton, O., Boulakia, S. C., Davidson, S. B., and Hara, C. S. 2008. Querying and managing provenance through user views in scientific workflows. In Proceedings of the International Conference on Data Engineering. 1072--1081.
[5]
Biton, O., Davidson, S. B., Khanna, S., and Roy, S. 2009. Optimizing user views for workflows. In Proceedings of the International Conference on Database Theory. 310--323.
[6]
Bobrik, R., Reichert, M., and Bauer, T. 2007. View-based process visualization. In Proceedings of the 5th International Conference on Business Process Management. 88--95.
[7]
Bowers, S., McPhillips, T. M., and Ludäscher, B. 2008. Provenance in collection-oriented scientific workflows. Concurr. Comput. 20, 5, 519--529.
[8]
Chapman, A., Jagadish, H. Y., and Ramanan, P. 2008. Efficient provenance storage. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 993--1006.
[9]
Cheney, J. 2007. Program slicing and data provenance.IEEE Data Eng. Bull. 30, 4, 22--28.
[10]
Cheney, J., Ahmed, A., and Acar, U. A. 2007. Provenance as dependency analysis. In Proceedings of the International Symposium on Database Programming Languages. 138--152.
[11]
Chiu, D. K. W., Cheung, S. C., Till, S., Karlapalem, K., Li, Q., and Kafeza, E. 2004. Workflow view driven cross-organizational interoperability in a Web service environment. Inf. Technol. Manage. 5, 3--4, 221--250.
[12]
Cohen, S., Boulakia, S. C., and Davidson, S. B. 2006. Towards a model of provenance and user views in scientific workflows. InProceedings of the International Workshop on Data Integration in the Life Sciences. 264--279.
[13]
da Cruz, S. M. S., Barros, P. M., Bisch, P. M., Campos, M. L. M., and Mattoso, M. 2008. Provenance services for distributed workflows. In Proceedings of the IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing. 526--533.
[14]
Davidson, S. B. and Freire, J. 2008. Provenance and scientific workflows: Challenges and opportunities. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 1345--1350.
[15]
Eshuis, R. and Grefen, P. 2008. Constructing customized process views. Data Knowl. Eng. 64, 2, 419--438.
[16]
Heinis, T. and Alonso, G. 2008. Efficient lineage tracking for scientific workflows. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 1007--1018.
[17]
Kindler, E., Martens, A., and Reisig, W. 2000. Inter-operability of workflow applications: Local criteria for global soundness. In Proceedings of the International Conference on Business Process Management. 235--253.
[18]
Liu, D.-R. and Shen, M. 2001. Modeling workflows with a process-view approach. In Proceedings of the International Conference on Database Systems for Advanced Applications. 260--267.
[19]
Liu, D.-R. and Shen, M. 2003. Workflow modeling for virtual processes: An order-preserving process-view approach. Inf. Syst. 28, 6, 505--532.
[20]
Liu, Z., Shao, Q., and Chen, Y. 2010. Searching workflows with hierarchical views. Proc. VLDB Endow., 3, 1, 918--927.
[21]
Oinn, T. M., Addis, M., Ferris, J., Marvin, D., Senger, M., Greenwood, R. M., Carver, T., Glover, K., Pocock, M. R, Wipat, A., and Li, P. 2004. Taverna: A tool for the composition and enactment of bioinformatics workflows. Bioinformatics 20, 17, 3045--3054.
[22]
Rajbhandari, S., Rana, O. F., and Wootten, I. 2008. A fuzzy model for calculating workflow trust using provenance data. In Proceedings of the Mardi Gras Conference. 10.
[23]
Sadiq, W. and Orlowska, M. E. 1999. Applying graph reduction techniques for identifying structural conflicts in process models. In Proceedings of the Conference on Advanced Information Systems Engineering. 195--209.
[24]
Shan, Z., Chu, D. K. W., and Li, Q. 2005. Systematic interaction management in a workflow view based business-to-business process engine. In Proceedings of the Hawaii International Conference on System Sciences.
[25]
Shao, Q., Chen, Y., Tao, S., Yan, X., and Anerousis, N. 2008a. EasyTicket: A ticket routing recommendation engine for enterprise problem resolution. Proc. VLDB Endow. 1, 2.
[26]
Shao, Q., Chen, Y., Tao, S., Yan, X., and Anerousis, N. 2008b. Efficient ticket routing by resolution sequence mining. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 605--613.
[27]
Shao, Q., Sun, P., and Chen, Y. 2009a. Efficiently discovering critical workflows in scientific explorations. Future Generation Comp. Syst. 25, 5, 577--585.
[28]
Shao, Q., Sun, P., and Chen, Y. 2009b. WISE: A Workflow Information Search Engine. In Proceedings of the International Conference on Data Engineering. 1491--1494.
[29]
Siegeris, J. and Zimmermann, A. 2006. Workflow model compositions preserving relaxed soundness. In Proceedings of the International Conference on Business Process Management. 177--192.
[30]
Simmhan, Y. L., Plale, B., and Gannon, D. 2006. A framework for collecting provenance in data-centric scientific workflows. In Proceedings of the IEEE International Conference on Web Services. 427--436.
[31]
Sjölander, K. 2004. Phylogenomic inference of protein molecular function: Advances and challenges. Bioinformatics 20, 2, 170--179.
[32]
Sun, P., Liu, Z., Davidson, S. B., and Chen, Y. 2009. Detecting and resolving unsound workflow views for correct provenance analysis. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 549 562.
[33]
Sun, P., Liu, Z., Natarajan, S., Davidson, S. B., and Chen, Y. 2009. WOLVES: Achieving correct provenance analysis by detecting and resolving unsound workflow views. Proc. VLDB Endow. 2, 2, 1514--1517.
[34]
van der Aalst. W. M. P. 2000. Workflow verification: Finding control-flow errors using Petri-net-based techniques. In Proceedings of the International Conference on Business Process Management. 161--183.
[35]
Vanhatalo, J., Volzer, H., and Leymann, F. 2007. Faster and more focused control-flow analysis for business process models through SESE decomposition. In Proceedings of the International Conference on Service Oriented Computing. 43--55.
[36]
Weiser. M. 1981. Program slicing. In Proceedings of the International Conference on Software Engineering. 439--449.

Cited By

View all
  • (2020)Abstracting PROV provenance graphs: A validity-preserving approachFuture Generation Computer Systems10.1016/j.future.2020.05.015111(352-367)Online publication date: Oct-2020
  • (2016)OmniSearch: a semantic search system based on the Ontology for MIcroRNA Target (OMIT) for microRNA-target gene interaction dataJournal of Biomedical Semantics10.1186/s13326-016-0064-27:1Online publication date: 10-May-2016
  • (2015)Towards web-scale how-provenance2015 31st IEEE International Conference on Data Engineering Workshops10.1109/ICDEW.2015.7129547(68-70)Online publication date: Apr-2015
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Database Systems
ACM Transactions on Database Systems  Volume 36, Issue 1
March 2011
251 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/1929934
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 March 2011
Accepted: 01 September 2010
Revised: 01 August 2010
Received: 01 May 2010
Published in TODS Volume 36, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. network
  2. provenance
  3. soundness
  4. view
  5. workflow

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2020)Abstracting PROV provenance graphs: A validity-preserving approachFuture Generation Computer Systems10.1016/j.future.2020.05.015111(352-367)Online publication date: Oct-2020
  • (2016)OmniSearch: a semantic search system based on the Ontology for MIcroRNA Target (OMIT) for microRNA-target gene interaction dataJournal of Biomedical Semantics10.1186/s13326-016-0064-27:1Online publication date: 10-May-2016
  • (2015)Towards web-scale how-provenance2015 31st IEEE International Conference on Data Engineering Workshops10.1109/ICDEW.2015.7129547(68-70)Online publication date: Apr-2015
  • (2015)Efficacy and safety of acupuncture in children: an overview of systematic reviewsPediatric Research10.1038/pr.2015.9178:2(112-119)Online publication date: 7-May-2015
  • (2015)A Novel Search Engine Supporting Specific Drug Queries and Literature Management9th International Conference on Practical Applications of Computational Biology and Bioinformatics10.1007/978-3-319-19776-0_11(99-106)Online publication date: 19-May-2015
  • (2013)A core calculus for provenanceJournal of Computer Security10.5555/2595044.259505021:6(919-969)Online publication date: 1-Nov-2013
  • (2013)Performance evaluation of parallel strategies in public cloudsFuture Generation Computer Systems10.1016/j.future.2012.12.01929:7(1816-1825)Online publication date: 1-Sep-2013
  • (2012)Hierarchical models of provenanceProceedings of the 4th USENIX conference on Theory and Practice of Provenance10.5555/2342875.2342885(10-10)Online publication date: 14-Jun-2012
  • (2012)Alcohol Withdrawal SyndromeCritical Care Clinics10.1016/j.ccc.2012.07.00428:4(549-585)Online publication date: Oct-2012

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media