skip to main content
10.1145/3221269.3221298acmotherconferencesArticle/Chapter ViewAbstractPublication PagesssdbmConference Proceedingsconference-collections
short-paper

Scheduling data-intensive scientific workflows with reduced communication

Published:09 July 2018Publication History

ABSTRACT

Data-intensive scientific workflows, typically modelled by directed acyclic graphs, consist of inter-dependent tasks that exchange significant amounts of data and are executed on parallel/distributed clusters. However, the energy or monetary costs associated with large data transfers between tasks executing on different nodes may be significant. As a result, there is scope to explore the possibility of trading some communication for computation, aiming to reduce overall communication costs. In this work, we propose a scheduling approach that scales the weight of communication to increase its impact when building the schedule of a scientific workflow; the aim is to assign pairs of tasks with significant data transfers to the same computational node so that the overall communication cost is minimized. The proposed approach is evaluated using simulation and three real-world scientific workflows. The tradeoff between scientific workflow execution time and the size of data transfers is assessed for different weights and a different number of computational nodes.

References

  1. Piotr Bryk, Maciej Malawski, Gideon Juve, and Ewa Deelman. 2016. Storage-aware Algorithms for Scheduling of Workflow Ensembles in Clouds. Journal of Grid Computing 14, 2 (2016), 359--378. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Louis-Claude Canon, Emmanuel Jeannot, Rizos Sakellariou, and Wei Zheng. 2008. Comparative evaluation of the robustness of DAG scheduling heuristics. In Grid Computing: Achievements and Prospects. Springer.Google ScholarGoogle Scholar
  3. Ümit V. Çatalyürek, Kamer Kaya, and Bora Uçar. 2011. Integrated Data Placement and Task Assignment for Scientific Workflows in Clouds. In Proceedings of the 4th International Workshop on Data-intensive Distributed Computing. ACM, 45--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Ann Chervenak, Ewa Deelman, Miron Livny, Mei-Hui Su, Rob Schuler, Shishir Bharathi, Gaurang Mehta, and Karan Vahi. 2007. Data placement for scientific applications in distributed environments. In Proceedings of the 8th IEEE/ACM International Conference on Grid Computing. IEEE, 267--274. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. S. Katz, J. C. Jacob, E. Deelman, C. Kesselman, G. Singh, M.-H. Su, G. B. Berriman, J. Good, A. C. Laity, and T. A. Prince. 2005. A Comparison of Two Methods for Building Astronomical Image Mosaics on a Grid. In Proceedings of the IEEE International Conference on Parallel Processing Workshops. IEEE, 85--94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. LIGO project, Laser interferometer gravitational wave observatory. 2017. (2017). http://www.ligo.caltech.edu/Google ScholarGoogle Scholar
  7. S. Pandey, L. Wu, S. M. Guru, and R. Buyya. 2010. A Particle Swarm Optimization-Based Heuristic for Scheduling Workflow Applications in Cloud Computing Environments. In Proceedings of the 24th International Conference on Advanced Information Networking and Applications. IEEE, 400--407. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Ramakrishnan, G. Singh, H. Zhao, E. Deelman, R. Sakellariou, K. Vahi, K. Blackburn, D. Meyers, and M. Samidi. 2007. Scheduling Data-Intensive Work-flows onto Storage-Constrained Distributed Resources. In Proceedings of the 7th IEEE International Symposium on Cluster Computing and the Grid. IEEE, 401--409. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Daniel A. Reed and Jack Dongarra. 2015. Exascale Computing and Big Data. Commun. ACM 58, 7 (2015), 56--68. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. A. Rodriguez and R. Buyya. 2017. A taxonomy and survey on scheduling algorithms for scientific workflows in IaaS cloud computing environments. Concurrency and Computation: Practice and Experience 29, 8 (2017).Google ScholarGoogle Scholar
  11. Cloud Workflow Simulator. 2013. (2013). https://github.com/malawski/cloudworkflowsimulatorGoogle ScholarGoogle Scholar
  12. Ian J. Taylor, Ewa Deelman, Dennis Gannon, and Matthew Shields. 2007. Workflows for e-Science. Springer.Google ScholarGoogle Scholar
  13. H. Topcuoglu, S. Hariri, and Min-You Wu. 2002. Performance-Effective and Low-complexity Task Scheduling for Heterogeneous Computing. IEEE Transactions on Parallel and Distributed Systems 13, 3 (2002), 260--274. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. USC Epigenome Center. 2017. (2017). http://epigenome.usc.eduGoogle ScholarGoogle Scholar
  15. Workflow Generator. 2013. (2013). https://confluence.pegasus.isi.edu/display/pegasus/WorkflowGeneratorGoogle ScholarGoogle Scholar
  16. D. Yuan, Y. Yang, X. Liu, and J. Chen. 2010. A Data Placement Strategy in Scientific Cloud Workflows. Future Generation Computer Systems 26, 8 (2010). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Scheduling data-intensive scientific workflows with reduced communication

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      SSDBM '18: Proceedings of the 30th International Conference on Scientific and Statistical Database Management
      July 2018
      314 pages
      ISBN:9781450365055
      DOI:10.1145/3221269

      Copyright © 2018 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 9 July 2018

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • short-paper

      Acceptance Rates

      SSDBM '18 Paper Acceptance Rate30of75submissions,40%Overall Acceptance Rate56of146submissions,38%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader