skip to main content
10.1145/1341811.1341822acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesmardi-grasConference Proceedingsconference-collections
research-article

Workflow task clustering for best effort systems with Pegasus

Published: 29 January 2008 Publication History

Abstract

Many scientific workflows are composed of fine computational granularity tasks, yet they are composed of thousands of them and are data intensive in nature, thus requiring resources such as the TeraGrid to execute efficiently. In order to improve the performance of such applications, we often employ task clustering techniques to increase the computational granularity of workflow tasks. The goal is to minimize the completion time of the workflow by reducing the impact of queue wait times. In this paper, we examine the performance impact of the clustering techniques using the Pegasus workflow management system. Experiments performed using an astronomy workflow on the NCSA TeraGrid cluster show that clustering can achieve a significant reduction in the workflow completion time (up to 97%).

References

[1]
D. S. Katz, N. Anagnostou, G. B. Berriman, E. Deelman, J. C. Good, J. C. Jacob, C. Kesselman, A. C. Laity, T. A. Prince, G. Singh, M. Su, and R. Williams, "Astronomical Image Mosaicking on a Grid: Initial Experiences," in Engineering the Grid: Status and Perspective, B. D. Martino, J. Dongarra, A. Hoisie, L. T. Yang, and H. Zima, Eds.: American Scientific Publishers, 2006.
[2]
A. Lathers, M.-H. Su, A. Kulungowski, A. W. Lin, G. Mehta, S. T. Peltier, E. Deelman, and M. H. Ellisman, "Enabling parallel scientific applications with workflow tools," presented at Challenges of Large Applications in Distributed Environments, 2006 IEEE, 2006.
[3]
E. Deelman, J. Blythe, Y. Gil, and C. Kesselman, "Workflow Management in GriPhyN," in Grid Resource Management: State of the Art and Future Trends, J. Nabrzyski, J. M. Schopf, and J. Weglarz, Eds.: Springer, 2003.
[4]
E. Deelman, S. Callaghan, E. Field, H. Francoeur, R. Graves, N. Gupta, V. Gupta, T. H. Jordan, C. Kesselman, P. Maechling, J. Mehringer, G. Mehta, D. Okaya, K. Vahi, and L. Zhao, "Managing Large-Scale Workflow Execution from Resource Provisioning to Provenance Tracking: The CyberShake Example," presented at Second IEEE International Conference on e-Science and Grid Computing, 2006.
[5]
"The Open Science Grid Consortium," http://www.opensciencegrid.org.
[6]
C. Catlett, "The philosophy of TeraGrid: building an open, extensible, distributed TeraScale facility," presented at Cluster Computing and the Grid 2nd IEEE/ACM International Symposium CCGRID2002, 2002.
[7]
R. L. Henderson, "Job Scheduling Under the Portable Batch System " in Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing Springer-Verlag, 1995 pp. 279--294
[8]
M. J. Litzkow, M. Livny, and M. W. Mutka, "Condor-a hunter of idle workstations," presented at Distributed Computing Systems, 1988., 8th International Conference on, 1988.
[9]
A. Iosup, C. Dumitrescu, D. Epema, H. Li, and L. Wolters, "How are Real Grids Used? The Analysis of Four Grid Traces and its Implications," presented at 7th IEEE/ACM International Conference on Grid Computing, Barcelona, Spain, 2006.
[10]
E. Deelman, G. Mehta, G. Singh, M.-H. Su, and K. Vahi, "Pegasus: Mapping Large-Scale Workflows to Distributed Resources," in Workflows for e-Science: Scientific Workflows for Grids, I. Taylor, E. Deelman, D. B. Gannon, and M. Shields, Eds.: Springer, 2007.
[11]
"Montage Project." http://montage.ipac.caltech.edu.
[12]
G. B. Berriman, E. Deelman, J. Good, J. Jacob, D. S. Katz, C. Kesselman, A. Laity, T. A. Prince, G. Singh, and M.-H. Su, "Montage: A Grid Enabled Engine for Delivering Custom Science-Grade Mosaics On Demand," presented at SPIE Conference 5487: Astronomical Telescopes, 2004.
[13]
"Montage," in http://montage.ipac.caltech.edu.
[14]
"Montage Components." http://montage.ipac.caltech.edu/docs/components.html.
[15]
E. Deelman, G. Singh, M.-H. Su, J. Blythe, Y. Gil, C. Kesselman, G. Mehta, K. Vahi, G. B. Berriman, J. Good, A. Laity, J. C. Jacob, and D. S. Katz, "Pegasus: A framework for mapping complex scientific workflows onto distributed systems," Scientific Programming, vol. 13, pp. 219--237, 2005.
[16]
Pegasus, "http://pegasus.isi.edu."
[17]
J. Frey, T. Tannenbaum, M. Livny, I. Foster, and S. Tuecke, "Condor-G: a computation management agent for multi-institutional grids," presented at High Performance Distributed Computing, 2001. Proceedings. 10th IEEE International Symposium on, 2001.
[18]
"Condor DAGMan." http://www.cs.wisc.edu/condor/dagman.
[19]
H. Topcuouglu, S. Hariri, and M.-y. Wu, "Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing," IEEE Transactions on Parallel and Distributed Systems, vol. 13(3), pp. 260--274, 2002.
[20]
E. Walker, J. P. Gardner, V. Litvin, and E. L. Turner, "Creating Personal Adaptive Clusters for Managing Scientific Jobs in a Distributed Computing Environment," presented at Workshop on Challenges of Large Applications in Distributed Environments (CLADE), 2006.
[21]
C. Pinchak, P. Lu, and M. Goldenberg, "Practical Heterogeneous Placeholder Scheduling in Overlay Metacomputers: Early Experiences," in Job Scheduling Strategies for Parallel Processing, D. G. F. a. L. R. a. U. Schwiegelshohn, Ed.: Springer Verlag, 2002, pp. 205--228.
[22]
Condor_Glidein, "http://www.cs.wisc.edu/condor/glidein."
[23]
D. Thain, T. Tannenbaum, and M. Livny, "Distributed Computing in Practice: The Condor Experience," Concurrency and Computation: Practice and Experience, vol. 17, pp. 323--356, 2005.
[24]
M. Goldenberg, P. Lu, and J. Schaeffer, "TrellisDAG: A System for Structured DAG Scheduling," in Job Scheduling Strategies for Parallel Processing, D. G. F. a. L. R. a. U. Schwiegelshohn, Ed.: Springer Verlag, 2003, pp. 21--43.
[25]
G. Singh, C. Kesselman, and E. Deelman, "Performance Impact of Resource Provisioning on Workflows," University of Southern California available at http://www.cs.usc.edu/Research/TechReports/05-850.pdf 05-850, 2005.
[26]
D. Nurmi, R. Wolski, J. Brevik, and G. Obertelli, "QBETS: Batch Queue Prediction System," presented at TeraGrid Conference, Madison, 2007, available at http://www.teragrid.org/events/teragrid07/archive/present ations/wednesday/QBETS.pdf.
[27]
ShowBF, "Maui User Manual, available at http://www.clusterresources.com/products/maui/docs/co mmands/showbf.shtml."
[28]
Y.-K. Kwok and I. Ahmad, "Static scheduling algorithms for allocating directed task graphs to multiprocessors," ACM Computing Survey, vol. 31, pp. 406--471, 1999.
[29]
D. G. Feitelson, L. Rudolph, U. Schwiegelshohn, K. C. Sevcik, and P. Wong, "Theory and Practice in Parallel Job Scheduling " in Proceedings of the Job Scheduling Strategies for Parallel Processing Springer-Verlag, 1997 pp. 1--34
[30]
A. B. Downey, "Using Queue Time Predictions for Processor Allocation " in Proceedings of the Job Scheduling Strategies for Parallel Processing Springer-Verlag, 1997 pp. 35--57
[31]
W. Cirne and F. Berman, "Using Moldability to Improve the Performance of Supercomputer Jobs," Journal of Parallel and Distributed Computing, vol. 62, pp. 1571--1601, 2002.
[32]
G. Singh, C. Kesselman, and E. Deelman, "A Provisioning Model and its Comparison with Best-Effort for Performance-Cost Optimization in Grids," in Proceedings of the 16th International Symposium on High Performance Distributed Computing (HPDC). Monterey, California, USA: ACM Press, 2007, pp. 117--126.
[33]
G. Singh, C. Kesselman, and E. Deelman, "Optimizing Grid-Based Workflow Execution," Journal of Grid Computing, vol. 3(3--4), pp. 201--219, 2005.
[34]
F. Nerieri, R. Prodan, T. Fahringer, and H.-L. Truong, "Overhead Analysis of Grid Workflow Applications," presented at 7th IEEE/ACM International Conference on Grid Computing, 2006.
[35]
J. Brevik, D. Nurmi, and R. Wolski, "Predicting Bounds on Queueing Delay in Space-Shared Computing Environments," presented at IEEE International Symposium on Workload Characterization, 2006.

Cited By

View all
  • (2023)Fair-Share Methods for Scheduling Scientific Workflows in Cloud2023 20th ACS/IEEE International Conference on Computer Systems and Applications (AICCSA)10.1109/AICCSA59173.2023.10479262(1-8)Online publication date: 4-Dec-2023
  • (2022)Taming System Dynamics on Resource Optimization for Data Processing Workflows: A Probabilistic ApproachIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.309140033:1(231-248)Online publication date: 1-Jan-2022
  • (2022)Energy-aware scientific workflow scheduling in cloud environmentCluster Computing10.1007/s10586-022-03613-325:6(3845-3874)Online publication date: 18-May-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
MG '08: Proceedings of the 15th ACM Mardi Gras conference: From lightweight mash-ups to lambda grids: Understanding the spectrum of distributed computing requirements, applications, tools, infrastructures, interoperability, and the incremental adoption of key capabilities
January 2008
178 pages
ISBN:9781595938350
DOI:10.1145/1341811
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • National e-Science Institute (Edinburgh, UK)
  • Louisiana State University (USA)

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 January 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. best effort systems
  2. queue wait time
  3. task clustering
  4. workflow clustering

Qualifiers

  • Research-article

Funding Sources

Conference

Mardi Gras'08
Sponsor:
Mardi Gras'08: 15th Mardi Gras Conference on Distributed Applications
January 29 - February 3, 2008
Louisiana, Baton Rouge, USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)3
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Fair-Share Methods for Scheduling Scientific Workflows in Cloud2023 20th ACS/IEEE International Conference on Computer Systems and Applications (AICCSA)10.1109/AICCSA59173.2023.10479262(1-8)Online publication date: 4-Dec-2023
  • (2022)Taming System Dynamics on Resource Optimization for Data Processing Workflows: A Probabilistic ApproachIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.309140033:1(231-248)Online publication date: 1-Jan-2022
  • (2022)Energy-aware scientific workflow scheduling in cloud environmentCluster Computing10.1007/s10586-022-03613-325:6(3845-3874)Online publication date: 18-May-2022
  • (2021)Optimizing Workflow Task Clustering Using Reinforcement LearningIEEE Access10.1109/ACCESS.2021.31014549(110614-110626)Online publication date: 2021
  • (2021)Energy-Based Comparison for Workflow Task Clustering TechniquesIntelligent Systems Design and Applications10.1007/978-3-030-71187-0_49(526-535)Online publication date: 3-Jun-2021
  • (2019)Data-Intensive Workflow Management: For Clouds and Data-Intensive and Scalable Computing EnvironmentsSynthesis Lectures on Data Management10.2200/S00915ED1V01Y201904DTM06014:4(1-179)Online publication date: 13-May-2019
  • (2019)Incorporating Probabilistic Optimizations for Resource Provisioning of Data Processing WorkflowsProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337847(1-10)Online publication date: 5-Aug-2019
  • (2019)The Evolution of the Pegasus Workflow Management SoftwareComputing in Science & Engineering10.1109/MCSE.2019.291969021:4(22-36)Online publication date: 1-Jul-2019
  • (2019)Fault tolerance for a scientific workflow system in a Cloud computing environmentInternational Journal of Computers and Applications10.1080/1206212X.2019.164765142:7(705-714)Online publication date: 30-Jul-2019
  • (2019)New approach to allocation planning of many‐task workflows on cloudsConcurrency and Computation: Practice and Experience10.1002/cpe.540432:2Online publication date: 18-Jun-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media