skip to main content
10.1145/2063348.2063363acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Cloud versus in-house cluster: evaluating Amazon cluster compute instances for running MPI applications

Published: 12 November 2011 Publication History

Abstract

The emergence of cloud services brings new possibilities for constructing and using HPC platforms. However, while cloud services provide the flexibility and convenience of customized, pay-as-you-go parallel computing, multiple previous studies in the past three years have indicated that cloud-based clusters need a significant performance boost to become a competitive choice, especially for tightly coupled parallel applications.
In this work, we examine the feasibility of running HPC applications in clouds. This study distinguishes itself from existing investigations in several ways: 1) We carry out a comprehensive examination of issues relevant to the HPC community, including performance, cost, user experience, and range of user activities. 2) We compare an Amazon EC2-based platform built upon its newly available HPC-oriented virtual machines with typical local cluster and supercomputer options, using benchmarks and applications with scale and problem size unprecedented in previous cloud HPC studies. 3) We perform detailed performance and scalability analysis to locate the chief limiting factors of the state-of-the-art cloud based clusters. 4) We present a case study on the impact of per-application parallel I/O system configuration uniquely enabled by cloud services. Our results reveal that though the scalability of EC2-based virtual clusters still lags behind traditional HPC alternatives, they are rapidly gaining in overall performance and cost-effectiveness, making them feasible candidates for performing tightly coupled scientific computing. In addition, our detailed benchmarking and profiling discloses and analyzes several problems regarding the performance and performance stability on EC2.

References

[1]
Y. Abe and G. Gibson. pWalrus: Towards Better Integration of Parallel File Systems into Cloud Storage. In Workshop on Interfaces and Abstractions for Scientific Data Storage, 2010.
[2]
Amazon Inc. High Performance Computing (HPC). http://aws.amazon.com/ec2/hpc-applications/, 2011.
[3]
A. G. Carlyle, S. L. Harrell, and P. M. Smith. Cost-effective hpc: The community or the cloud? In IEEE International Conference on Cloud Computing Technology and Science, Los Alamitos, CA, USA, 2010. IEEE Computer Society.
[4]
P. Carns, W. Ligon III, R. Ross, and R. Thakur. PVFS: A parallel file system for Linux clusters. In Proceedings of the 4th annual Linux Showcase & Conference-Volume 4, pages 28--28. USENIX Association, 2000.
[5]
J. S. Chase, D. E. Irwin, L. E. Grit, J. D. Moore, and S. E. Sprenkle. Dynamic Virtual Clusters in a Grid Site Manager. In International Symposium on High-Performance Distributed Computing. IEEE Computer Society, 2003.
[6]
D. Chen, J. Xue, X. Yang, H. Zhang, X. Shen, J. Hu, Y. Wang, L. Ji, and J. Chen. New generation of multi-scale NWP system (GRAPES): general scientific design. Chinese Science Bulletin, 53(22):3433--3445, 2008.
[7]
Cluster File Systems, Inc. Lustre: A scalable, high-performance file system. http://www.lustre.org/docs/whitepaper.pdf, 2002.
[8]
A. Darling, L. Carey, and W. Feng. The design, implementation, and evaluation of mpiBLAST. In Proceedings of the ClusterWorld Conference and Expo, in conjunction with the 4th International Conference on Linux Clusters: The HPC Revolution, 2003.
[9]
E. Deelman, G. Singh, M. Livny, B. Berriman, and J. Good. The Cost of Doing Science on the Cloud: the Montage Example. In Proceedings of the ACM/IEEE conference on Supercomputing, 2008.
[10]
C. Evangelinos and C. Hill. Cloud Computing for parallel Scientific HPC Applications: Feasibility of Running Coupled Atmosphere-Ocean Climate Models on Amazon's EC2. In The 1st Workshop on Cloud Computing and its Applications (CCA), 2008.
[11]
R. J. Figueiredo, P. A. Dinda, and J. A. B. Fortes. A Case For Grid Computing On Virtual Machines. In International Conference on Distributed Computing Systems, 2003.
[12]
Q. He, S. Zhou, B. Kobler, D. Duffy, and T. McGlynn. Case Study for Running HPC Applications in Public Clouds. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, New York, NY, USA, 2010. ACM.
[13]
N. Hemsoth. Amazon adds hpc capability to ec2. HPC in the Cloud, July 2010.
[14]
Z. Hill and M. Humphrey. A Quantitative Analysis of High Performance Computing with Amazon's EC2 Infrastructure: The Death of the Local Cluster? In Proceedings of the 10th IEEE/ACM International Conference on Grid Computing, 2009.
[15]
C. Hoffa, G. Mehta, T. Freeman, E. Deelman, K. Keahey, B. Berriman, and J. Good. On the Use of Cloud Computing for Scientific Workflows. IEEE International Conference on eScience, pages 640--645, 2008.
[16]
W. Huang, J. Liu, B. Abali, and D. K. Panda. A Case for High Performance Computing with Virtual Machines. In Proceedings of the 20th International Conference on Supercomputing, 2006.
[17]
Intel Inc. Intel MPI Benchmarks. http://software.intel.com/en-us/articles/intel-mpi-benchmarks/.
[18]
A. Iosup, S. Ostermann, N. Yigitbasi, R. Prodan, T. Fahringer, and D. Epema. Performance analysis of cloud computing services for many-tasks scientific computing. IEEE Transactions on Parallel and Distributed Systems, 99, 2011.
[19]
K. R. Jackson, L. Ramakrishnan, K. Muriki, S. Canon, S. Cholia, J. Shalf, H. J. Wasserman, and N. J. Wright. Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud. In IEEE Second International Conference on Cloud Computing Technology and Science, 2010.
[20]
G. Juve, E. Deelman, K. Vahi, G. Mehta, B. Berriman, B. P. Berman, and P. Maechling. Data sharing options for scientific workflows on amazon ec2. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC '10, pages 1--9, 2010.
[21]
K. Keahey, R. Figueiredo, J. Fortes, T. Freeman, and M. Tsugawa. Science Clouds: Early Experiences in Cloud Computing for Scientific Applications. In The 1st Workshop on Cloud Computing and its Applications (CCA), 2008.
[22]
LANL. Parallel ocean program (pop). http://climate.lanl.gov/Models/POP, April 2011.
[23]
J. Li, M. Humphrey, D. Agarwal, K. Jackson, C. van Ingen, and Y. Ryu. eScience in the Cloud: A MODIS Satellite Data Reprojection and Reduction Pipeline in the Windows Azure Platform. In IEEE International Symposium on Parallel Distributed Processing, 2010.
[24]
H. Lin, P. Balaji, R. Poole, C. Sosa, X. Ma, and W. Feng. Massively parallel genomic sequence search on the Blue Gene/P architecture. Austin, TX, Nov. 2008.
[25]
P. Marshall, K. Keahey, and T. Freeman. Elastic Site: Using Clouds to Elastically Extend Site Resources. In Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, 2010.
[26]
J. Napper and P. Bientinesi. Can Cloud Computing Reach the Top500? In Proceedings of the combined workshops on UnConventional high performance computing workshop plus memory access workshop, New York, NY, USA, 2009. ACM.
[27]
The NAS Parallel Benchmarks. http://www.nas.nasa.gov/Resources/Software/npb.html.
[28]
National Center for Biotechnology Information. NCBI BLAST. http://www.ncbi.nlm.nih.gov/BLAST/.
[29]
B. Nowicki. NFS: Network File System Protocol Specification. Network Working Group RFC1094, 1989.
[30]
S. Ostermann, A. Iosup, N. Yigitbasi, R. Prodan, T. Fahringer, and D. Epema. A performance analysis of ec2 cloud computing services for scientific computing. In Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, 2010.
[31]
M. R. Palankar, A. Iamnitchi, M. Ripeanu, and S. Garfinkel. Amazon S3 for Science Grids: A Viable Solution? In Proceedings of the International Workshop on Data-Aware Distributed Computing. ACM, 2008.
[32]
F. Schmuck and R. Haskin. GPFS: a shared-disk file system for large computing clusters. In Proceedings of the First Conference on File and Storage Technologies, 2002.
[33]
H. Shan, K. Antypas, and J. Shalf. Characterizing and predicting the I/O performance of HPC applications using a parameterized synthetic benchmark. In Proceedings of the 2008 ACM/IEEE conference on Supercomputing, page 42. IEEE Press, 2008.
[34]
T. Sterling and D. Stark. A High-Performance Computing Forecast: Partly Cloudy. Computing in Science and Engineering, 11, 2009.
[35]
Top500 supercomputer sites. http://www.top500.org/.
[36]
T. University. Technique report r2011.4.10. http://www.hpctest.org.cn/resources/cloud.pdf.
[37]
C. Vecchiola, S. Pandey, and R. Buyya. High-performance cloud computing: A view of scientific applications. In International Symposium on Parallel Architectures, Algorithms, and Networks. IEEE Computer Society, 2009.
[38]
E. Walker. Benchmarking Amazon EC2 for High-Performance Scientific Computing. Login, 33(5), 2008.
[39]
H. Wang, Q. Jing, R. Chen, B. He, Z. Qian, and L. Zhou. Distributed Systems Meet Economics: Pricing in the Cloud. In Proceedings of the 2nd USENIX conference on Hot topics in cloud computing, HotCloud'10. USENIX Association, 2010.
[40]
L. Youseff, R. Wolski, B. Gorda, and C. Krintz. Evaluating the Performance Impact of Xen on MPI and Process Execution For HPC Systems. In Proceedings of the 2nd International Workshop on Virtualization Technology in Distributed Computing, 2006.
[41]
W. Yu and J. S. Vetter. Xen-Based HPC: A Parallel I/O Perspective. In IEEE International Symposium on Cluster Computing and the Grid. IEEE Computer Society, 2008.

Cited By

View all
  • (2023)SNDVI: a new scalable serverless framework to compute NDVIFrontiers in High Performance Computing10.3389/fhpcp.2023.11515301Online publication date: 25-Aug-2023
  • (2023)rFaaS: Enabling High Performance Serverless with RDMA and Leases2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS54959.2023.00094(897-907)Online publication date: May-2023
  • (2023)Optimizing job scheduling by using broad learning to predict execution times on HPC clustersCCF Transactions on High Performance Computing10.1007/s42514-023-00137-z6:4(365-377)Online publication date: 23-Feb-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '11: State of the Practice Reports
November 2011
242 pages
ISBN:9781450311397
DOI:10.1145/2063348
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2011

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

SC '11
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)25
  • Downloads (Last 6 weeks)2
Reflects downloads up to 22 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)SNDVI: a new scalable serverless framework to compute NDVIFrontiers in High Performance Computing10.3389/fhpcp.2023.11515301Online publication date: 25-Aug-2023
  • (2023)rFaaS: Enabling High Performance Serverless with RDMA and Leases2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS54959.2023.00094(897-907)Online publication date: May-2023
  • (2023)Optimizing job scheduling by using broad learning to predict execution times on HPC clustersCCF Transactions on High Performance Computing10.1007/s42514-023-00137-z6:4(365-377)Online publication date: 23-Feb-2023
  • (2023)Cloud benchmarking and performance analysis of an HPC application in Amazon EC2Cluster Computing10.1007/s10586-023-04060-427:2(2273-2290)Online publication date: 28-Jun-2023
  • (2023)Performance Prediction for Scalability AnalysisPerformance Analysis of Parallel Applications for HPC10.1007/978-981-99-4366-1_6(129-161)Online publication date: 19-Jun-2023
  • (2022)Noise in the CloudsProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35706096:3(1-27)Online publication date: 8-Dec-2022
  • (2022)Cloud Workload CharacterizationCloud Computing with Security and Scalability.10.1007/978-3-031-07242-0_5(77-99)Online publication date: 4-Sep-2022
  • (2022)Cloud Computing PyramidCloud Computing with Security and Scalability.10.1007/978-3-031-07242-0_3(51-62)Online publication date: 4-Sep-2022
  • (2021)Evaluating Distributed Computing Infrastructures: An Empirical Study Comparing Hadoop Deployments on Cloud and Local SystemsIEEE Transactions on Cloud Computing10.1109/TCC.2019.29023779:3(1075-1088)Online publication date: 1-Jul-2021
  • (2021)Accelerating Parallel Applications in Cloud Platforms via Adaptive Time-Slice ControlIEEE Transactions on Computers10.1109/TC.2020.299961970:7(992-1005)Online publication date: 1-Jul-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media