skip to main content
10.1145/2493123.2462919acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article

A comparative study of high-performance computing on the cloud

Published: 17 June 2013 Publication History

Abstract

The popularity of Amazon's EC2 cloud platform has increased in recent years. However, many high-performance computing (HPC) users consider dedicated high-performance clusters, typically found in large compute centers such as those in national laboratories, to be far superior to EC2 because of significant communication overhead of the latter. Our view is that this is quite narrow and the proper metrics for comparing high-performance clusters to EC2 is turnaround time and cost.
In this paper, we compare the top-of-the-line EC2 cluster to HPC clusters at Lawrence Livermore National Laboratory (LLNL) based on turnaround time and total cost of execution. When measuring turnaround time, we include expected queue wait time on HPC clusters. Our results show that although as expected, standard HPC clusters are superior in raw performance, EC2 clusters may produce better turnaround times. To estimate cost, we developed a pricing model---relative to EC2's node-hour prices---to set node-hour prices for (currently free) LLNL clusters. We observe that the cost-effectiveness of running an application on a cluster depends on raw performance and application scalability.

References

[1]
ASC purple benchmarks. https://asc.llnl.gov/computing_resources/purple/archive/benchmarks/.
[2]
Futuregrid project. https://portal.futuregrid.org/.
[3]
Parallel workloads archive. http://www.cs.h-uji.ac.il/labs/parallel/workload/.
[4]
Simple linux utility for resource management, faq. https://computing.llnl.gov/linux/slurm/faq.html#backfill.
[5]
ASC sequoia benchmarks. http://asc.llnl.gov/sequoia/benchmarks/, 2009.
[6]
Amazon. Amazon web service elastic compute cloud (EC2). http://aws.amazon.com/ec2.
[7]
D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, L. Dagum, R. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. Weeratunga. The NAS parallel benchmarks--summary and preliminary results. In Supercomputing, Nov. 1991.
[8]
J. Brodkin.$1,279-per-hour, 30,000 core cluster built on Amazon EC2 cloud. http://arstechnica.com/business/news/2011/09/30000-core-cluster-built-on-amazon-ec2-cloud.ars, 2011.
[9]
R. Campbell, I. Gupta, M. Heath, S. Y. Ko, M. Kozuch, M. Kunze, T. Kwan, K. Lai, H. Y. Lee, M. Lyons, D. Milojicic, D. O'Hallaron, and Y. C. Soh. Open cirrus cloud computing testbed: federated data centers for open source systems and services research. In Hot Topics in Cloud Computing, 2009.
[10]
A. G. Carlyle, S. L. Harrell, and P. M. Smith. Cost-effective HPC: The community or the cloud? In IEEE International Conference on Cloud Computing Technology and Science, 2010.
[11]
J. Ekanayake and G. Fox. High performance parallel computing with clouds and cloud technologies. In Cloud Computing, pages 20--38. 2010.
[12]
Y. El-Khamra, H. Kim, S. Jha, and M. Parashar. Exploring the performance fluctuations of HPC workloads on clouds. In IEEE CloudCom, Nov. 2010.
[13]
R. R. Expósito, G. L. Taboada, S. Ramos, J. Tourino, and R. Doallo. Performance analysis of HPC applications in the cloud. Future Generation Computer Systems, pages 218--229, 2013.
[14]
M. Fenn, J. Holmes, and J. Nucciarone. A performance and cost analysis of the Amazon elastic compute cluster compute instance. http://rcc.its.psu.edu/education/white_papers/cloud_report.pdf, 2011.
[15]
Y. Gong, B. He, and J. Zhong. An overview of CMPI: network performance aware MPI in the cloud. In ACM PPOPP, Feb 2012.
[16]
Q. He, S. Zhou, B. Kobler, D. Duffy, and T. McGlynn. Case study for running HPC applications in public clouds. In ACM HPDC, 2010.
[17]
Z. Hill and M. Humphrey. A quantitative analysis of high performance computing with Amazon's EC2 infrastructure: the death of the local cluster? In International Conf. on Grid Computing, Oct. 2009.
[18]
K. Z. Ibrahim, S. Hofmeyr, and C. Iancu. Characterizing the performance of parallel applications on multi-socket virtual machines. In IEEE/ACM CCGrid, 2011.
[19]
K. R. Jackson, L. Ramakrishnan, K. Muriki, S. Canon, S. Cholia, J. Shalf, H. J. Wasserman, and N. J. Wright. Performance analysis of high performance computing applications on the Amazon web services cloud. In IEEE CloudCom, Nov. 2010.
[20]
G. A. Jehle and P. J. Reny. Advanced Microeconomic Theory. Prentice Hall, 2000.
[21]
D. Klusácek and H. Rudová. Alea 2 -- job scheduling simulator. In SIMUTools, 2010.
[22]
S. H. Langer, B. Still, P.-T. Bremer, D. Hinkel, B. Langdon, J. Leviney, and E. Williams. Cielo full-system simulations of multi-beam laser-plasma interaction in NIF experiments. In Cray Users Group Meeting, May 2011.
[23]
A. Li, X. Yang, S. Kandula, and M. Zhang. CloudCmp: comparing public cloud providers. In IEEE Conference on Internet Measurement, 2010.
[24]
J. D. McCalpin. The STREAM benchmark. http://www.cs.virginia.edu/ mccalpin/STREAM_Benchmark_2005-01--25.pdf.
[25]
P. Mehrotra, J. Djomehri, S. Heistand, R. Hood, H. Jin, A. Lazanoff, S. Saini, and R. Biswas. Performance evaluation of Amazon EC2 for NASA HPC applications. In Workshop on Scientific Cloud Computing, 2012.
[26]
D. Nurmi, J. Brevik, and R. Wolski. Qbets: Queue bounds estimation from time series. In Wkshp on Job Scheduling Strategies for Parallel Processing, Jun 2007.
[27]
F. Petrini, D. J. Kerbyson, and S. Pakin. The case of the missing supercomputer performance: Achieving optimal performance on the 8,192 processors of ASCI Q. In Supercomputing, 2003.
[28]
F. Schatz, S. Koschnicke, N. Paulsen, C. Starke, and M. Schimmler. MPI performance analysis of Amazon EC2 cloud services for high performance computing. In Advances in Computing and Communications, pages 371--381. 2011.
[29]
D. Singh. personal communication, Mar. 2012.
[30]
K. E. Train. Discrete Choice Methods with Simulation. Cambridge University Press, 2009.
[31]
E. Walker. The real cost of a CPU hour. Computer, 42(4):35--41, 2009.
[32]
Wikipedia. Cloud computing. http://en.wikipedia.org/wiki/Cloud_computing.
[33]
Windows Azure Big Compute. http://www.windowsazure.com/en-us/home/features/big-compute/.
[34]
K. Yelick, S. Coghlan, B. Draney, and R. S. Canon. The magellan report on cloud computing for science. science.energy.gov/ /media/ascr/pdf/program-documents/docs/Magellan_Fin%al_Report.pdf, December 2011.
[35]
L. Youseff, R. Wolski, B. Gorda, and C. Krintz. Evaluating the performance impact of Xen on MPI and process execution for HPC systems. In International Workshop on Virtualization Technology in Distributed Computing, 2006.
[36]
Y. Zhai, M. Liu, J. Zhai, X. Ma, and W. Chen. Cloud versus in-house cluster: evaluating amazon cluster compute instances for running MPI applications. In Supercomputing, Nov. 2011.

Cited By

View all
  • (2024)LLAMP: Assessing Network Latency Tolerance of HPC Applications with Linear ProgrammingSC24: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41406.2024.00070(1-18)Online publication date: 17-Nov-2024
  • (2024)An Empirical Analysis Of Cloud Platforms For High Performance Computing2024 4th International Conference on Innovative Practices in Technology and Management (ICIPTM)10.1109/ICIPTM59628.2024.10563442(1-6)Online publication date: 21-Feb-2024
  • (2022)Noise in the CloudsProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35706096:3(1-27)Online publication date: 8-Dec-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
HPDC '13: Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
June 2013
276 pages
ISBN:9781450319102
DOI:10.1145/2493123
  • General Chairs:
  • Manish Parashar,
  • Jon Weissman,
  • Program Chairs:
  • Dick Epema,
  • Renato Figueiredo
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 June 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cloud
  2. cost
  3. high-performance computing
  4. turnaround time

Qualifiers

  • Research-article

Conference

HPDC'13
Sponsor:

Acceptance Rates

HPDC '13 Paper Acceptance Rate 20 of 131 submissions, 15%;
Overall Acceptance Rate 166 of 966 submissions, 17%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)46
  • Downloads (Last 6 weeks)3
Reflects downloads up to 22 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)LLAMP: Assessing Network Latency Tolerance of HPC Applications with Linear ProgrammingSC24: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41406.2024.00070(1-18)Online publication date: 17-Nov-2024
  • (2024)An Empirical Analysis Of Cloud Platforms For High Performance Computing2024 4th International Conference on Innovative Practices in Technology and Management (ICIPTM)10.1109/ICIPTM59628.2024.10563442(1-6)Online publication date: 21-Feb-2024
  • (2022)Noise in the CloudsProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35706096:3(1-27)Online publication date: 8-Dec-2022
  • (2022)NVMe-oAFProceedings of the 31st International Symposium on High-Performance Parallel and Distributed Computing10.1145/3502181.3531476(56-70)Online publication date: 27-Jun-2022
  • (2021)ENABLE HIGH PERFORMANCE COMPUTING IN CLOUD: A REVIEWINTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH10.36106/ijsr/4230701(44-45)Online publication date: 1-May-2021
  • (2021)Skyway: A Seamless Solution for Bursting Workloads from On-Premises HPC Clusters to Commercial CloudsPractice and Experience in Advanced Research Computing 2021: Evolution Across All Dimensions10.1145/3437359.3465607(1-5)Online publication date: 17-Jul-2021
  • (2020)Benchmarking Microsoft Azure Virtual Machines for the use of HPC applications2020 11th International Conference on Information and Communication Systems (ICICS)10.1109/ICICS49469.2020.239525(382-387)Online publication date: Apr-2020
  • (2020)HPCCloud Seer: A Performance Model Based Predictor for Parallel Applications on the CloudIEEE Access10.1109/ACCESS.2020.29928808(87978-87993)Online publication date: 2020
  • (2019)The Chimera and the CyborgAdvances in Science, Technology and Engineering Systems Journal10.25046/aj0402014:2Online publication date: 2019
  • (2019)Performance Modeling of MPI-based Applications on Cloud Multicore ServersProceedings of the Rapid Simulation and Performance Evaluation: Methods and Tools10.1145/3300189.3300194(1-6)Online publication date: 21-Jan-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media