research-article

Cloud versus in-house cluster: evaluating Amazon cluster compute instances for running MPI applications

Authors:

Wenguang ChenAuthors Info & Claims

SC '11: State of the Practice Reports

Article No.: 11, Pages 1 - 10

https://doi.org/10.1145/2063348.2063363

Published: 12 November 2011 Publication History

Abstract

The emergence of cloud services brings new possibilities for constructing and using HPC platforms. However, while cloud services provide the flexibility and convenience of customized, pay-as-you-go parallel computing, multiple previous studies in the past three years have indicated that cloud-based clusters need a significant performance boost to become a competitive choice, especially for tightly coupled parallel applications.

In this work, we examine the feasibility of running HPC applications in clouds. This study distinguishes itself from existing investigations in several ways: 1) We carry out a comprehensive examination of issues relevant to the HPC community, including performance, cost, user experience, and range of user activities. 2) We compare an Amazon EC2-based platform built upon its newly available HPC-oriented virtual machines with typical local cluster and supercomputer options, using benchmarks and applications with scale and problem size unprecedented in previous cloud HPC studies. 3) We perform detailed performance and scalability analysis to locate the chief limiting factors of the state-of-the-art cloud based clusters. 4) We present a case study on the impact of per-application parallel I/O system configuration uniquely enabled by cloud services. Our results reveal that though the scalability of EC2-based virtual clusters still lags behind traditional HPC alternatives, they are rapidly gaining in overall performance and cost-effectiveness, making them feasible candidates for performing tightly coupled scientific computing. In addition, our detailed benchmarking and profiling discloses and analyzes several problems regarding the performance and performance stability on EC2.

References

[1]

Y. Abe and G. Gibson. pWalrus: Towards Better Integration of Parallel File Systems into Cloud Storage. In Workshop on Interfaces and Abstractions for Scientific Data Storage, 2010.

[2]

Amazon Inc. High Performance Computing (HPC). http://aws.amazon.com/ec2/hpc-applications/, 2011.

[3]

A. G. Carlyle, S. L. Harrell, and P. M. Smith. Cost-effective hpc: The community or the cloud? In IEEE International Conference on Cloud Computing Technology and Science, Los Alamitos, CA, USA, 2010. IEEE Computer Society.

Digital Library

[4]

P. Carns, W. Ligon III, R. Ross, and R. Thakur. PVFS: A parallel file system for Linux clusters. In Proceedings of the 4th annual Linux Showcase & Conference-Volume 4, pages 28--28. USENIX Association, 2000.

Digital Library

[5]

J. S. Chase, D. E. Irwin, L. E. Grit, J. D. Moore, and S. E. Sprenkle. Dynamic Virtual Clusters in a Grid Site Manager. In International Symposium on High-Performance Distributed Computing. IEEE Computer Society, 2003.

Digital Library

[6]

D. Chen, J. Xue, X. Yang, H. Zhang, X. Shen, J. Hu, Y. Wang, L. Ji, and J. Chen. New generation of multi-scale NWP system (GRAPES): general scientific design. Chinese Science Bulletin, 53(22):3433--3445, 2008.

[7]

Cluster File Systems, Inc. Lustre: A scalable, high-performance file system. http://www.lustre.org/docs/whitepaper.pdf, 2002.

[8]

A. Darling, L. Carey, and W. Feng. The design, implementation, and evaluation of mpiBLAST. In Proceedings of the ClusterWorld Conference and Expo, in conjunction with the 4th International Conference on Linux Clusters: The HPC Revolution, 2003.

[9]

E. Deelman, G. Singh, M. Livny, B. Berriman, and J. Good. The Cost of Doing Science on the Cloud: the Montage Example. In Proceedings of the ACM/IEEE conference on Supercomputing, 2008.

Digital Library

[10]

C. Evangelinos and C. Hill. Cloud Computing for parallel Scientific HPC Applications: Feasibility of Running Coupled Atmosphere-Ocean Climate Models on Amazon's EC2. In The 1st Workshop on Cloud Computing and its Applications (CCA), 2008.

[11]

R. J. Figueiredo, P. A. Dinda, and J. A. B. Fortes. A Case For Grid Computing On Virtual Machines. In International Conference on Distributed Computing Systems, 2003.

Digital Library

[12]

Q. He, S. Zhou, B. Kobler, D. Duffy, and T. McGlynn. Case Study for Running HPC Applications in Public Clouds. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, New York, NY, USA, 2010. ACM.

Digital Library

[13]

N. Hemsoth. Amazon adds hpc capability to ec2. HPC in the Cloud, July 2010.

[14]

Z. Hill and M. Humphrey. A Quantitative Analysis of High Performance Computing with Amazon's EC2 Infrastructure: The Death of the Local Cluster? In Proceedings of the 10th IEEE/ACM International Conference on Grid Computing, 2009.

[15]

C. Hoffa, G. Mehta, T. Freeman, E. Deelman, K. Keahey, B. Berriman, and J. Good. On the Use of Cloud Computing for Scientific Workflows. IEEE International Conference on eScience, pages 640--645, 2008.

Digital Library

[16]

W. Huang, J. Liu, B. Abali, and D. K. Panda. A Case for High Performance Computing with Virtual Machines. In Proceedings of the 20th International Conference on Supercomputing, 2006.

Digital Library

[17]

Intel Inc. Intel MPI Benchmarks. http://software.intel.com/en-us/articles/intel-mpi-benchmarks/.

[18]

A. Iosup, S. Ostermann, N. Yigitbasi, R. Prodan, T. Fahringer, and D. Epema. Performance analysis of cloud computing services for many-tasks scientific computing. IEEE Transactions on Parallel and Distributed Systems, 99, 2011.

Digital Library

[19]

K. R. Jackson, L. Ramakrishnan, K. Muriki, S. Canon, S. Cholia, J. Shalf, H. J. Wasserman, and N. J. Wright. Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud. In IEEE Second International Conference on Cloud Computing Technology and Science, 2010.

Digital Library

[20]

G. Juve, E. Deelman, K. Vahi, G. Mehta, B. Berriman, B. P. Berman, and P. Maechling. Data sharing options for scientific workflows on amazon ec2. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC '10, pages 1--9, 2010.

Digital Library

[21]

K. Keahey, R. Figueiredo, J. Fortes, T. Freeman, and M. Tsugawa. Science Clouds: Early Experiences in Cloud Computing for Scientific Applications. In The 1st Workshop on Cloud Computing and its Applications (CCA), 2008.

[22]

LANL. Parallel ocean program (pop). http://climate.lanl.gov/Models/POP, April 2011.

[23]

J. Li, M. Humphrey, D. Agarwal, K. Jackson, C. van Ingen, and Y. Ryu. eScience in the Cloud: A MODIS Satellite Data Reprojection and Reduction Pipeline in the Windows Azure Platform. In IEEE International Symposium on Parallel Distributed Processing, 2010.

[24]

H. Lin, P. Balaji, R. Poole, C. Sosa, X. Ma, and W. Feng. Massively parallel genomic sequence search on the Blue Gene/P architecture. Austin, TX, Nov. 2008.

[25]

P. Marshall, K. Keahey, and T. Freeman. Elastic Site: Using Clouds to Elastically Extend Site Resources. In Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, 2010.

Digital Library

[26]

J. Napper and P. Bientinesi. Can Cloud Computing Reach the Top500? In Proceedings of the combined workshops on UnConventional high performance computing workshop plus memory access workshop, New York, NY, USA, 2009. ACM.

Digital Library

[27]

The NAS Parallel Benchmarks. http://www.nas.nasa.gov/Resources/Software/npb.html.

[28]

National Center for Biotechnology Information. NCBI BLAST. http://www.ncbi.nlm.nih.gov/BLAST/.

[29]

B. Nowicki. NFS: Network File System Protocol Specification. Network Working Group RFC1094, 1989.

[30]

S. Ostermann, A. Iosup, N. Yigitbasi, R. Prodan, T. Fahringer, and D. Epema. A performance analysis of ec2 cloud computing services for scientific computing. In Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, 2010.

[31]

M. R. Palankar, A. Iamnitchi, M. Ripeanu, and S. Garfinkel. Amazon S3 for Science Grids: A Viable Solution? In Proceedings of the International Workshop on Data-Aware Distributed Computing. ACM, 2008.

Digital Library

[32]

F. Schmuck and R. Haskin. GPFS: a shared-disk file system for large computing clusters. In Proceedings of the First Conference on File and Storage Technologies, 2002.

Digital Library

[33]

H. Shan, K. Antypas, and J. Shalf. Characterizing and predicting the I/O performance of HPC applications using a parameterized synthetic benchmark. In Proceedings of the 2008 ACM/IEEE conference on Supercomputing, page 42. IEEE Press, 2008.

Digital Library

[34]

T. Sterling and D. Stark. A High-Performance Computing Forecast: Partly Cloudy. Computing in Science and Engineering, 11, 2009.

Digital Library

[35]

Top500 supercomputer sites. http://www.top500.org/.

[36]

T. University. Technique report r2011.4.10. http://www.hpctest.org.cn/resources/cloud.pdf.

[37]

C. Vecchiola, S. Pandey, and R. Buyya. High-performance cloud computing: A view of scientific applications. In International Symposium on Parallel Architectures, Algorithms, and Networks. IEEE Computer Society, 2009.

Digital Library

[38]

E. Walker. Benchmarking Amazon EC2 for High-Performance Scientific Computing. Login, 33(5), 2008.

[39]

H. Wang, Q. Jing, R. Chen, B. He, Z. Qian, and L. Zhou. Distributed Systems Meet Economics: Pricing in the Cloud. In Proceedings of the 2nd USENIX conference on Hot topics in cloud computing, HotCloud'10. USENIX Association, 2010.

Digital Library

[40]

L. Youseff, R. Wolski, B. Gorda, and C. Krintz. Evaluating the Performance Impact of Xen on MPI and Process Execution For HPC Systems. In Proceedings of the 2nd International Workshop on Virtualization Technology in Distributed Computing, 2006.

Digital Library

[41]

W. Yu and J. S. Vetter. Xen-Based HPC: A Parallel I/O Perspective. In IEEE International Symposium on Cluster Computing and the Grid. IEEE Computer Society, 2008.

Digital Library

Cited By

Iacono LPacios DVázquez-Poletti J(2023)SNDVI: a new scalable serverless framework to compute NDVIFrontiers in High Performance Computing10.3389/fhpcp.2023.11515301Online publication date: 25-Aug-2023
https://doi.org/10.3389/fhpcp.2023.1151530
Copik MTaranov KCalotoiu AHoefler T(2023)rFaaS: Enabling High Performance Serverless with RDMA and Leases2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS54959.2023.00094(897-907)Online publication date: May-2023
https://doi.org/10.1109/IPDPS54959.2023.00094
Hou ZShen HFeng QLv ZJin JZhou XGu J(2023)Optimizing job scheduling by using broad learning to predict execution times on HPC clustersCCF Transactions on High Performance Computing10.1007/s42514-023-00137-z6:4(365-377)Online publication date: 23-Feb-2023
https://doi.org/10.1007/s42514-023-00137-z
Show More Cited By

Cloud versus in-house cluster: evaluating Amazon cluster compute instances for running MPI applications

Recommendations

Cloud service engineering
ICSE '10: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 2

Building on compute and storage virtualization, Cloud Computing provides scalable, network-centric, abstracted IT infrastructure, platforms, and applications as on-demand services that are billed by consumption. Cloud Service Engineering is the ...
Cost-benefit analysis of Cloud Computing versus desktop grids
IPDPS '09: Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing

Cloud Computing has taken commercial computing by storm. However, adoption of cloud computing platforms and services by the scientific community is in its infancy as the performance and monetary cost-benefits for scientific applications are not ...
Amazon Cloud Computing With Java

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SC '11: State of the Practice Reports

November 2011

242 pages

ISBN:9781450311397

DOI:10.1145/2063348

Conference Chair:
Scott Lathrop
University of Chicago
,
Program Chairs:
Jim Costa
Sandia National Laboratories
,
William Kramer
National Center for Supercomputing Applications

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE-CS: Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

Conference

SC '11

Sponsor:

SIGARCH
IEEE-CS

SC '11: International Conference for High Performance Computing, Networking, Storage and Analysis

November 12 - 18, 2011

Washington, Seattle

Acceptance Rates

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

85
Total Citations
View Citations
856
Total Downloads

Downloads (Last 12 months)25
Downloads (Last 6 weeks)2

Reflects downloads up to 22 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Iacono LPacios DVázquez-Poletti J(2023)SNDVI: a new scalable serverless framework to compute NDVIFrontiers in High Performance Computing10.3389/fhpcp.2023.11515301Online publication date: 25-Aug-2023
https://doi.org/10.3389/fhpcp.2023.1151530
Copik MTaranov KCalotoiu AHoefler T(2023)rFaaS: Enabling High Performance Serverless with RDMA and Leases2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS54959.2023.00094(897-907)Online publication date: May-2023
https://doi.org/10.1109/IPDPS54959.2023.00094
Hou ZShen HFeng QLv ZJin JZhou XGu J(2023)Optimizing job scheduling by using broad learning to predict execution times on HPC clustersCCF Transactions on High Performance Computing10.1007/s42514-023-00137-z6:4(365-377)Online publication date: 23-Feb-2023
https://doi.org/10.1007/s42514-023-00137-z
Dancheva TAlonso UBarton M(2023)Cloud benchmarking and performance analysis of an HPC application in Amazon EC2Cluster Computing10.1007/s10586-023-04060-427:2(2273-2290)Online publication date: 28-Jun-2023
https://doi.org/10.1007/s10586-023-04060-4
Zhai JJin YChen WZheng WZhai JJin YChen WZheng W(2023)Performance Prediction for Scalability AnalysisPerformance Analysis of Parallel Applications for HPC10.1007/978-981-99-4366-1_6(129-161)Online publication date: 19-Jun-2023
https://doi.org/10.1007/978-981-99-4366-1_6
De Sensi DDe Matteis TTaranov KDi Girolamo SRahn THoefler T(2022)Noise in the CloudsProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35706096:3(1-27)Online publication date: 8-Dec-2022
https://dl.acm.org/doi/10.1145/3570609
Sehgal NBhatt PAcken JSehgal NBhatt PAcken J(2022)Cloud Workload CharacterizationCloud Computing with Security and Scalability.10.1007/978-3-031-07242-0_5(77-99)Online publication date: 4-Sep-2022
https://doi.org/10.1007/978-3-031-07242-0_5
Sehgal NBhatt PAcken JSehgal NBhatt PAcken J(2022)Cloud Computing PyramidCloud Computing with Security and Scalability.10.1007/978-3-031-07242-0_3(51-62)Online publication date: 4-Sep-2022
https://doi.org/10.1007/978-3-031-07242-0_3
Bhattacharya DCurrim FRam S(2021)Evaluating Distributed Computing Infrastructures: An Empirical Study Comparing Hadoop Deployments on Cloud and Local SystemsIEEE Transactions on Cloud Computing10.1109/TCC.2019.29023779:3(1075-1088)Online publication date: 1-Jul-2021
https://doi.org/10.1109/TCC.2019.2902377
Fan HWu SZhao XXie ZDi SXiao JYu CJin H(2021)Accelerating Parallel Applications in Cloud Platforms via Adaptive Time-Slice ControlIEEE Transactions on Computers10.1109/TC.2020.299961970:7(992-1005)Online publication date: 1-Jul-2021
https://doi.org/10.1109/TC.2020.2999619
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten