ABSTRACT
Data-intensive scientific workflows, typically modelled by directed acyclic graphs, consist of inter-dependent tasks that exchange significant amounts of data and are executed on parallel/distributed clusters. However, the energy or monetary costs associated with large data transfers between tasks executing on different nodes may be significant. As a result, there is scope to explore the possibility of trading some communication for computation, aiming to reduce overall communication costs. In this work, we propose a scheduling approach that scales the weight of communication to increase its impact when building the schedule of a scientific workflow; the aim is to assign pairs of tasks with significant data transfers to the same computational node so that the overall communication cost is minimized. The proposed approach is evaluated using simulation and three real-world scientific workflows. The tradeoff between scientific workflow execution time and the size of data transfers is assessed for different weights and a different number of computational nodes.
- Piotr Bryk, Maciej Malawski, Gideon Juve, and Ewa Deelman. 2016. Storage-aware Algorithms for Scheduling of Workflow Ensembles in Clouds. Journal of Grid Computing 14, 2 (2016), 359--378. Google ScholarDigital Library
- Louis-Claude Canon, Emmanuel Jeannot, Rizos Sakellariou, and Wei Zheng. 2008. Comparative evaluation of the robustness of DAG scheduling heuristics. In Grid Computing: Achievements and Prospects. Springer.Google Scholar
- Ümit V. Çatalyürek, Kamer Kaya, and Bora Uçar. 2011. Integrated Data Placement and Task Assignment for Scientific Workflows in Clouds. In Proceedings of the 4th International Workshop on Data-intensive Distributed Computing. ACM, 45--54. Google ScholarDigital Library
- Ann Chervenak, Ewa Deelman, Miron Livny, Mei-Hui Su, Rob Schuler, Shishir Bharathi, Gaurang Mehta, and Karan Vahi. 2007. Data placement for scientific applications in distributed environments. In Proceedings of the 8th IEEE/ACM International Conference on Grid Computing. IEEE, 267--274. Google ScholarDigital Library
- D. S. Katz, J. C. Jacob, E. Deelman, C. Kesselman, G. Singh, M.-H. Su, G. B. Berriman, J. Good, A. C. Laity, and T. A. Prince. 2005. A Comparison of Two Methods for Building Astronomical Image Mosaics on a Grid. In Proceedings of the IEEE International Conference on Parallel Processing Workshops. IEEE, 85--94. Google ScholarDigital Library
- LIGO project, Laser interferometer gravitational wave observatory. 2017. (2017). http://www.ligo.caltech.edu/Google Scholar
- S. Pandey, L. Wu, S. M. Guru, and R. Buyya. 2010. A Particle Swarm Optimization-Based Heuristic for Scheduling Workflow Applications in Cloud Computing Environments. In Proceedings of the 24th International Conference on Advanced Information Networking and Applications. IEEE, 400--407. Google ScholarDigital Library
- A. Ramakrishnan, G. Singh, H. Zhao, E. Deelman, R. Sakellariou, K. Vahi, K. Blackburn, D. Meyers, and M. Samidi. 2007. Scheduling Data-Intensive Work-flows onto Storage-Constrained Distributed Resources. In Proceedings of the 7th IEEE International Symposium on Cluster Computing and the Grid. IEEE, 401--409. Google ScholarDigital Library
- Daniel A. Reed and Jack Dongarra. 2015. Exascale Computing and Big Data. Commun. ACM 58, 7 (2015), 56--68. Google ScholarDigital Library
- M. A. Rodriguez and R. Buyya. 2017. A taxonomy and survey on scheduling algorithms for scientific workflows in IaaS cloud computing environments. Concurrency and Computation: Practice and Experience 29, 8 (2017).Google Scholar
- Cloud Workflow Simulator. 2013. (2013). https://github.com/malawski/cloudworkflowsimulatorGoogle Scholar
- Ian J. Taylor, Ewa Deelman, Dennis Gannon, and Matthew Shields. 2007. Workflows for e-Science. Springer.Google Scholar
- H. Topcuoglu, S. Hariri, and Min-You Wu. 2002. Performance-Effective and Low-complexity Task Scheduling for Heterogeneous Computing. IEEE Transactions on Parallel and Distributed Systems 13, 3 (2002), 260--274. Google ScholarDigital Library
- USC Epigenome Center. 2017. (2017). http://epigenome.usc.eduGoogle Scholar
- Workflow Generator. 2013. (2013). https://confluence.pegasus.isi.edu/display/pegasus/WorkflowGeneratorGoogle Scholar
- D. Yuan, Y. Yang, X. Liu, and J. Chen. 2010. A Data Placement Strategy in Scientific Cloud Workflows. Future Generation Computer Systems 26, 8 (2010). Google ScholarDigital Library
Index Terms
- Scheduling data-intensive scientific workflows with reduced communication
Recommendations
Improving energy-efficiency of large-scale workflows in heterogeneous systems
With the rapid growth of grid computing, more and more data-intensive applications have been deployed in grid environments, which in turn increase the energy consumption in high-performance computing platforms. To address the issue of energy consumption ...
A MapReduce workflow system for architecting scientific data intensive applications
SECLOUD '11: Proceedings of the 2nd International Workshop on Software Engineering for Cloud ComputingMapReduce is promising for developing both scalable business and scientific data intensive applications. However, there are few existing scientific workflow systems which can benefit from the MapReduce programming model. We propose a workflow system for ...
A Survey of Data-Intensive Scientific Workflow Management
Nowadays, more and more computer-based scientific experiments need to handle massive amounts of data. Their data processing consists of multiple computational steps and dependencies within them. A data-intensive scientific workflow is useful for ...
Comments