research-article

Cray XT4: an early evaluation for petascale scientific simulation

Authors:

Patrick H. WorleyAuthors Info & Claims

SC '07: Proceedings of the 2007 ACM/IEEE conference on Supercomputing

Article No.: 39, Pages 1 - 12

https://doi.org/10.1145/1362622.1362675

Published: 10 November 2007 Publication History

Get Access

Abstract

The scientific simulation capabilities of next generation high-end computing technology will depend on striking a balance among memory, processor, I/O, and local and global network performance across the breadth of the scientific simulation space. The Cray XT4 combines commodity AMD dual core Opteron processor technology with the second generation of Cray's custom communication accelerator in a system design whose balance is claimed to be driven by the demands of scientific simulation. This paper presents an evaluation of the Cray XT4 using micro-benchmarks to develop a controlled understanding of individual system components, providing the context for analyzing and comprehending the performance of several petascale-ready applications. Results gathered from several strategic application domains are compared with observations on the previous generation Cray XT3 and other high-end computing systems, demonstrating performance improvements across a wide variety of application benchmark problems.

References

[1]

W. J. Camp and J. L. Tomkins, "Thor's hammer: The first version of the Red Storm MPP architecture," Proceedings of the SC 2002 Conference on High Performance Networking and Computing, Baltimore, MD, November 2002.

Google Scholar

[2]

Sandia Red Storm System, http://www.sandia.gov/ASC/redstorm.html.

Google Scholar

[3]

S. R. Alam, R. F. Barrett, M. R. Fahey, J. A. Kuehn, O. E. B. Messer, R. T. Mills, P. C. Roth, J. S. Vetter, P. H. Worley, "An Evaluation of the ORNL Cray XT3, "International Journal of High Performance Computing Applications, 2006.

Google Scholar

[4]

J. S. Vetter, S. R. Alam, et al., "Early Evaluation of the Cray XT3," Proc. IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2006.

Digital Library

Google Scholar

[5]

Cray XT3 Data Sheet, http://cray.com/downloads/Cray_XT3_Datasheet.pdf.

Google Scholar

[6]

Cray XT4 Data Sheet, http://cray.com/downloads/Cray_XT3_Datasheet.pdf.

Google Scholar

[7]

M. Snir, W. D. Gropp, et al., Eds., MPI-the complete reference (2-volume set), 2nd ed., Cambridge, MA, MIT Press, 1998.

Google Scholar

[8]

M. Fahey, "XT Parallel IO," NCCS Users Meeting, Oak Ridge National Laboratory, TN, 2007.

Google Scholar

[9]

P. Luszczek, J. Dongarra, D. Koester, R. Rabenseifner, B. Lucas, J. Kepner, J. McCalpin, D. Bailey, D. Takahashi, "Introduction to the HPC Challenge Benchmark Suite," March, 2005.

Google Scholar

[10]

J. Dongarra, P. Luszczek, "Introduction to the HPC Challenge Benchmark Suite," ICL Technical Report, ICL-UT-05-01, (Also appears as CS Dept. Tech Report UT-CS-05-544), 2005.

Google Scholar

[11]

P. Luszczek, D. Koester, "HPC Challenge vl.x Benchmark Suite," SC\05 Tutorial-S13, Seattle, WA, November 13, 2005.

Google Scholar

[12]

High Performance Computing Challenge Benchmark Suite Website, http://icl.cs.utk.edu/hpcc/.

Google Scholar

[13]

Custom Fortran/MPI code to test I/, G. Wagenbreth, 2007.

Google Scholar

[14]

IOR Benchmark, ftp://ftp.llnl.gov/pub/siop/ior/.

Google Scholar

[15]

Custom Fortran/MPI code to test I/O, M. Fahey, 2007.

Google Scholar

[16]

W. D. Collins, P. J. Rasch, "Description of the NCAR Community Atmosphere Model (CAM 3.0)," National Center for Atmospheric Research, Boulder, CO 80307, 2004.

Google Scholar

[17]

W. D. Collins, P. J. Rasch, et al., "The Formulation and Atmospheric Simulation of the Community Atmosphere Model: CAM3," Journal of Climate, to appear, 2006.

Crossref

Google Scholar

[18]

Community Climate System Model, http://www.ccsm.ucar.edu/.

Google Scholar

[19]

M. B. Blackmon, B. Boville, et al., "The Community Climate System Model," BAMS, 82(11):2357--76, 2001.

Crossref

Google Scholar

[20]

P. H. Worley, "CAM Performance on the X1E and XT3," Proc. Proceedings of the 48th Cray User Group Conference, May 8--11, 2006.

Google Scholar

[21]

L. Dagum, R. Menon, "OpenMP: An Industry-Standard API for Shared-Memory Programming," IEEE Computational Science & Engineering, 5(1):46--55, 1998.

Digital Library

Google Scholar

[22]

J. T. Kiehl, J. J. Hack, et al., "The National Center for Atmospheric Research Community Climate Model: CCM3," Journal of Climate, 11:1131--49, 1998.

Crossref

Google Scholar

[23]

D. L. Williamson, J. G. Olson, "Climate simulations with a semi-Lagrangian version of the NCAR Community Climate Model," MWR, 122:1594--610, 1994.

Crossref

Google Scholar

[24]

S. J. Lin, "A vertically Lagrangian finite-volume dynamical core for global models," MWR, 132(10):2293--307, 2004.

Crossref

Google Scholar

[25]

P. W. Jones, P. H. Worley, et al., "Practical performance portability in the Parallel Ocean Program (POP)," Concurrency and Computation: Experience and Practice (in press), 2004.

Digital Library

Google Scholar

[26]

P. H. Worley, J. Levesque, "The Performance Evolution of the Parallel Ocean Program on the Cray XI," Proceedings of the 46th Cray User Group Conference, Knoxville, TN, May 17--21, 2004.

Google Scholar

[27]

R. W. Numrich, J. K. Reid, "Co-Array Fortran for parallel programming," ACM Fortran Forum, 17(2):1--31, 1998.

Digital Library

Google Scholar

[28]

A. Chronopoulos, C. Gear, "s-step iterative methods for symmetric linear systems," J. Comput. Appl. Math, 25:153--168, 1989.

Digital Library

Google Scholar

[29]

J. C. Phillips, R. Braun, W. Wang, J. Gumbart, E. Tajkhorshid, E. Villa, C. Chipot, R. D. Skeel, L. Kale, K. Schulten, "Scalable molecular dynamics with NAMD," Journal of Computational Chemistry, 26:1781--1802, 2005.

Crossref

Google Scholar

[30]

S. Kumar, G. Almasi, C. Huang, L. V. Kale, "Achieving Strong Scaling with NAMD on Blue Gene/L," presented at IEEE International Parallel & Distributed Processing Symposium, Rhodes Island, Greece, 2006.

Digital Library

Google Scholar

[31]

J. H. Chen, E. R Hawkes, R. Sankaran, et al., "Direct numerical simulation of ignition front propagation in a constant volume with temperature inhomogeneities I. fundamental analysis and diagnostics," Combustion and Flame, 145:128--144, 2006.

Crossref

Google Scholar

[32]

R. Sankaran, E. R. Hawkes, J. H. Chen, et al., "Structure of a spatially developing turbulent lean methane-air Bunsen flame," Proceedings of the Combustion Institute, 31:1291--1298, 2007.

Crossref

Google Scholar

[33]

E. R. Hawkes, R. Sankaran, J. C. Sutherland, et al., "Scalar mixing in direct numerical simulations of temporally evolving nonpremixed plane jet flames with skeletal CO-H2 kinetics," Proceedings of the Combustion Institute, 31:1633--1640, 2007.

Crossref

Google Scholar

[34]

C. A. Kennedy, M. H. Carpenter, R. M. Lewis, "Low-storage explicit Runge-Kutta schemes for the compressible Navier-Stokes equations," Applied Numerical Mathematics, 35(3):177--264, 2000.

Digital Library

Google Scholar

[35]

E. F. Jaeger, L. A. Berry, et al., "All-orders spectral calculation of radio frequency heating in two-dimensional toroidal plasmas," Phys. Plasmas, 8(1573), 2001.

Google Scholar

[36]

HPL - A Portable Implementation of the High-Performance Linpack Benchmark for Distributed-Memory Computers, http://www.netlib.org/benchmark/hpl.

Google Scholar

[37]

Private communication, Eduardo D'Azevedo, Oak Ridge National Laboratory, 2007.

Google Scholar

[38]

K. Goto, R. A. van de Geijn, "Anatomy of High-Performance Matrix Multiplication," ACM Transactions on Mathematical Software, accepted pending modifications.

Digital Library

Google Scholar

Cited By

View all

Khan ASim HVazhkudai SButt AKim Y(2021)An Analysis of System Balance and Architectural Trends Based on Top500 SupercomputersThe International Conference on High Performance Computing in Asia-Pacific Region10.1145/3432261.3432263(11-22)Online publication date: 20-Jan-2021
https://dl.acm.org/doi/10.1145/3432261.3432263
Simonov ABrekhov O(2021)Architecture and Functionality of the Collective Operations Subnet of the Angara InterconnectDistributed Computer and Communication Networks10.1007/978-3-030-66471-8_17(209-219)Online publication date: 2-Jan-2021
https://doi.org/10.1007/978-3-030-66471-8_17
Sim HVazhkudai S(2019)Profiling the Usage of an Extreme-Scale Archival Storage System2019 IEEE 27th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)10.1109/MASCOTS.2019.00050(410-422)Online publication date: Oct-2019
https://doi.org/10.1109/MASCOTS.2019.00050
Show More Cited By

Recommendations

Performance evaluation of supercomputers using HPCC and IMB Benchmarks

The HPC Challenge (HPCC) Benchmark suite and the Intel MPI Benchmark (IMB) are used to compare and evaluate the combined performance of processor, memory subsystem and interconnect fabric of five leading supercomputers-SGI Altix BX2, Cray X1, Cray ...
Application Performance on the Tri-Lab Linux Capacity Cluster-TLCC

In a recent acquisition by DOE/NNSA several large capacity computing clusters called TLCC have been installed at the DOE labs: SNL, LANL and LLNL. TLCC architecture with ccNUMA, multi-socket, multi-core nodes, and InfiniBand interconnect, is ...
Application Performance on the Newest Processors and GPUs
PEARC '18: Proceedings of the Practice and Experience on Advanced Research Computing: Seamless Creativity

This paper discusses the capabilities of the newest processors and GPUs to run a mixture of the most common chemistry applications. The baseline system for these comparisons is the 32-core Intel Broadwell processor which has been around for two years. ...

Reviews

Reviewer: Graham K. Jenkins

I have to confess to turning a bright shade of green when I read this paper. This was because I manage a cluster having about 200 AMD Opteron cores; the authors are from Oak Ridge National Laboratory (ORNL), where they have a system with some 23,000 Opteron cores! It should be noted that the Cray XT4 is an evolutionary descendant of the XT3, which used AMD Opteron 100-series processors, a Cray SeaStar custom interconnect, and a Catamount lightweight kernel operating system (OS). The SeaStar network interface controller (NIC) actually incorporates a PowerPC 440 processor with a direct memory access (DMA) engine onboard, enabling it to provide a sustained bidirectional bandwidth exceeding six gigabytes per second (GB/s). The Catamount OS used in the execute nodes was developed by Sandia National Laboratories to support one thread per node; it was subsequently enhanced to enable the use of dual-core Opterons. The body of the paper details a comparison of performance characteristics for a single-core XT3 cluster, a dual-core XT3 cluster, and a dual-core XT4 cluster. Application performance is the ultimate measure of system capability; insight gained from an analysis of microbenchmark tests can be helpful in optimizing performance. Among the microbenchmarks considered are network latency, matrix transposition (single and message-passing interface (MPI)), and the fast Fourier transform (single and MPI). In each case, the results are presented in colored graphics. Some well-known applications in atmospheric modeling (the community atmosphere model (CAM)), ocean modeling (the parallel ocean program (POP)), biomolecular simulation (nanoscale molecular dynamics (NAMD)), turbulent combustion (direct numerical simulation (DNS)), and fusion (all orders spectral algorithm (AORSA)) are then considered. Again, results are graphically presented. The paper ends with an analysis of results. There are no real surprises here; using both cores in a node can give performance improvements for algorithms that exhibit degrees of temporal locality. If you manage a high-performance computing facility, you'll find this paper invaluable. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

SC '07: Proceedings of the 2007 ACM/IEEE conference on Supercomputing

November 2007

723 pages

ISBN:9781595937643

DOI:10.1145/1362622

General Chair:
Becky Verastegui
Oak Ridge National Laboratory

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 November 2007

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

U.S. Department of Energy

Conference

SC '07

Sponsor:

SIGARCH
IEEE-CS

SC '07: International Conference for High Performance Computing, Networking, Storage and Analysis

November 10 - 16, 2007

Nevada, Reno

Acceptance Rates

SC '07 Paper Acceptance Rate 54 of 268 submissions, 20%;

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

41
Total Citations
View Citations
616
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Khan ASim HVazhkudai SButt AKim Y(2021)An Analysis of System Balance and Architectural Trends Based on Top500 SupercomputersThe International Conference on High Performance Computing in Asia-Pacific Region10.1145/3432261.3432263(11-22)Online publication date: 20-Jan-2021
https://dl.acm.org/doi/10.1145/3432261.3432263
Simonov ABrekhov O(2021)Architecture and Functionality of the Collective Operations Subnet of the Angara InterconnectDistributed Computer and Communication Networks10.1007/978-3-030-66471-8_17(209-219)Online publication date: 2-Jan-2021
https://doi.org/10.1007/978-3-030-66471-8_17
Sim HVazhkudai S(2019)Profiling the Usage of an Extreme-Scale Archival Storage System2019 IEEE 27th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)10.1109/MASCOTS.2019.00050(410-422)Online publication date: Oct-2019
https://doi.org/10.1109/MASCOTS.2019.00050
Hu WLiu GJiang Y(2018)FTRP: a new fault tolerance framework using process replication and prefetching for high-performance computingFrontiers of Information Technology & Electronic Engineering10.1631/FITEE.160145019:10(1273-1290)Online publication date: 28-Nov-2018
https://doi.org/10.1631/FITEE.1601450
Ramaswamy AKumar NNeelakantan ALam HStitt G(2018)Scalable Behavioral Emulation of Extreme-Scale Systems Using Structural Simulation ToolkitProceedings of the 47th International Conference on Parallel Processing10.1145/3225058.3225124(1-11)Online publication date: 13-Aug-2018
https://dl.acm.org/doi/10.1145/3225058.3225124
Mubarak MCarothers CRoss RCarns P(2017)Enabling Parallel Simulation of Large-Scale HPC Network SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2016.254372528:1(87-100)Online publication date: 1-Jan-2017
https://dl.acm.org/doi/10.1109/TPDS.2016.2543725
Trobec RVasiljević RTomašević MMilutinović VBeivide RValero M(2016)Interconnection Networks in Petascale Computer SystemsACM Computing Surveys10.1145/298338749:3(1-24)Online publication date: 16-Sep-2016
https://dl.acm.org/doi/10.1145/2983387
Khasymski ANikolopoulos D(2015)Scalable black-box prediction models for multi-dimensional adaptation on NUMA multi-coresInternational Journal of Parallel, Emergent and Distributed Systems10.1080/17445760.2014.89534630:3(193-210)Online publication date: 1-May-2015
https://dl.acm.org/doi/10.1080/17445760.2014.895346
Mubarak MCarothers CRoss RCarns PHamilton JRiley GFujimoto R(2014)A case study in using massively parallel simulation for extreme-scale torus network codesignProceedings of the 2nd ACM SIGSIM Conference on Principles of Advanced Discrete Simulation10.1145/2601381.2601383(27-38)Online publication date: 18-May-2014
https://dl.acm.org/doi/10.1145/2601381.2601383
Navaridas JLuján MPlana LTemple SFurber STrancoso PFranklin DMcKee S(2014)On generating multicast routes for SpiNNakerProceedings of the 11th ACM Conference on Computing Frontiers10.1145/2597917.2597938(1-10)Online publication date: 20-May-2014
https://dl.acm.org/doi/10.1145/2597917.2597938
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Cited By

Index Terms

Recommendations

Performance evaluation of supercomputers using HPCC and IMB Benchmarks

Application Performance on the Tri-Lab Linux Capacity Cluster-TLCC

Application Performance on the Newest Processors and GPUs

Reviews

Access critical reviews of Computing literature here

Comments

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Upcoming Conference

Other Metrics

Article Metrics

Other Metrics

Cited By

Login options

Full Access

PDF

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Performance evaluation of supercomputers using HPCC and IMB Benchmarks

Application Performance on the Tri-Lab Linux Capacity Cluster-TLCC

Application Performance on the Newest Processors and GPUs

Reviews

Access critical reviews of Computing literature here

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations