skip to main content
10.1145/1362622.1362675acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections

Cray XT4: an early evaluation for petascale scientific simulation

Published: 10 November 2007 Publication History


The scientific simulation capabilities of next generation high-end computing technology will depend on striking a balance among memory, processor, I/O, and local and global network performance across the breadth of the scientific simulation space. The Cray XT4 combines commodity AMD dual core Opteron processor technology with the second generation of Cray's custom communication accelerator in a system design whose balance is claimed to be driven by the demands of scientific simulation. This paper presents an evaluation of the Cray XT4 using micro-benchmarks to develop a controlled understanding of individual system components, providing the context for analyzing and comprehending the performance of several petascale-ready applications. Results gathered from several strategic application domains are compared with observations on the previous generation Cray XT3 and other high-end computing systems, demonstrating performance improvements across a wide variety of application benchmark problems.


W. J. Camp and J. L. Tomkins, "Thor's hammer: The first version of the Red Storm MPP architecture," Proceedings of the SC 2002 Conference on High Performance Networking and Computing, Baltimore, MD, November 2002.
Sandia Red Storm System,
S. R. Alam, R. F. Barrett, M. R. Fahey, J. A. Kuehn, O. E. B. Messer, R. T. Mills, P. C. Roth, J. S. Vetter, P. H. Worley, "An Evaluation of the ORNL Cray XT3, "International Journal of High Performance Computing Applications, 2006.
J. S. Vetter, S. R. Alam, et al., "Early Evaluation of the Cray XT3," Proc. IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2006.
Cray XT3 Data Sheet,
Cray XT4 Data Sheet,
M. Snir, W. D. Gropp, et al., Eds., MPI-the complete reference (2-volume set), 2nd ed., Cambridge, MA, MIT Press, 1998.
M. Fahey, "XT Parallel IO," NCCS Users Meeting, Oak Ridge National Laboratory, TN, 2007.
P. Luszczek, J. Dongarra, D. Koester, R. Rabenseifner, B. Lucas, J. Kepner, J. McCalpin, D. Bailey, D. Takahashi, "Introduction to the HPC Challenge Benchmark Suite," March, 2005.
J. Dongarra, P. Luszczek, "Introduction to the HPC Challenge Benchmark Suite," ICL Technical Report, ICL-UT-05-01, (Also appears as CS Dept. Tech Report UT-CS-05-544), 2005.
P. Luszczek, D. Koester, "HPC Challenge vl.x Benchmark Suite," SC\05 Tutorial-S13, Seattle, WA, November 13, 2005.
High Performance Computing Challenge Benchmark Suite Website,
Custom Fortran/MPI code to test I/, G. Wagenbreth, 2007.
IOR Benchmark,
Custom Fortran/MPI code to test I/O, M. Fahey, 2007.
W. D. Collins, P. J. Rasch, "Description of the NCAR Community Atmosphere Model (CAM 3.0)," National Center for Atmospheric Research, Boulder, CO 80307, 2004.
W. D. Collins, P. J. Rasch, et al., "The Formulation and Atmospheric Simulation of the Community Atmosphere Model: CAM3," Journal of Climate, to appear, 2006.
Community Climate System Model,
M. B. Blackmon, B. Boville, et al., "The Community Climate System Model," BAMS, 82(11):2357--76, 2001.
P. H. Worley, "CAM Performance on the X1E and XT3," Proc. Proceedings of the 48th Cray User Group Conference, May 8--11, 2006.
L. Dagum, R. Menon, "OpenMP: An Industry-Standard API for Shared-Memory Programming," IEEE Computational Science & Engineering, 5(1):46--55, 1998.
J. T. Kiehl, J. J. Hack, et al., "The National Center for Atmospheric Research Community Climate Model: CCM3," Journal of Climate, 11:1131--49, 1998.
D. L. Williamson, J. G. Olson, "Climate simulations with a semi-Lagrangian version of the NCAR Community Climate Model," MWR, 122:1594--610, 1994.
S. J. Lin, "A vertically Lagrangian finite-volume dynamical core for global models," MWR, 132(10):2293--307, 2004.
P. W. Jones, P. H. Worley, et al., "Practical performance portability in the Parallel Ocean Program (POP)," Concurrency and Computation: Experience and Practice (in press), 2004.
P. H. Worley, J. Levesque, "The Performance Evolution of the Parallel Ocean Program on the Cray XI," Proceedings of the 46th Cray User Group Conference, Knoxville, TN, May 17--21, 2004.
R. W. Numrich, J. K. Reid, "Co-Array Fortran for parallel programming," ACM Fortran Forum, 17(2):1--31, 1998.
A. Chronopoulos, C. Gear, "s-step iterative methods for symmetric linear systems," J. Comput. Appl. Math, 25:153--168, 1989.
J. C. Phillips, R. Braun, W. Wang, J. Gumbart, E. Tajkhorshid, E. Villa, C. Chipot, R. D. Skeel, L. Kale, K. Schulten, "Scalable molecular dynamics with NAMD," Journal of Computational Chemistry, 26:1781--1802, 2005.
S. Kumar, G. Almasi, C. Huang, L. V. Kale, "Achieving Strong Scaling with NAMD on Blue Gene/L," presented at IEEE International Parallel & Distributed Processing Symposium, Rhodes Island, Greece, 2006.
J. H. Chen, E. R Hawkes, R. Sankaran, et al., "Direct numerical simulation of ignition front propagation in a constant volume with temperature inhomogeneities I. fundamental analysis and diagnostics," Combustion and Flame, 145:128--144, 2006.
R. Sankaran, E. R. Hawkes, J. H. Chen, et al., "Structure of a spatially developing turbulent lean methane-air Bunsen flame," Proceedings of the Combustion Institute, 31:1291--1298, 2007.
E. R. Hawkes, R. Sankaran, J. C. Sutherland, et al., "Scalar mixing in direct numerical simulations of temporally evolving nonpremixed plane jet flames with skeletal CO-H2 kinetics," Proceedings of the Combustion Institute, 31:1633--1640, 2007.
C. A. Kennedy, M. H. Carpenter, R. M. Lewis, "Low-storage explicit Runge-Kutta schemes for the compressible Navier-Stokes equations," Applied Numerical Mathematics, 35(3):177--264, 2000.
E. F. Jaeger, L. A. Berry, et al., "All-orders spectral calculation of radio frequency heating in two-dimensional toroidal plasmas," Phys. Plasmas, 8(1573), 2001.
HPL - A Portable Implementation of the High-Performance Linpack Benchmark for Distributed-Memory Computers,
Private communication, Eduardo D'Azevedo, Oak Ridge National Laboratory, 2007.
K. Goto, R. A. van de Geijn, "Anatomy of High-Performance Matrix Multiplication," ACM Transactions on Mathematical Software, accepted pending modifications.

Cited By

View all
  • (2021)An Analysis of System Balance and Architectural Trends Based on Top500 SupercomputersThe International Conference on High Performance Computing in Asia-Pacific Region10.1145/3432261.3432263(11-22)Online publication date: 20-Jan-2021
  • (2021)Architecture and Functionality of the Collective Operations Subnet of the Angara InterconnectDistributed Computer and Communication Networks10.1007/978-3-030-66471-8_17(209-219)Online publication date: 2-Jan-2021
  • (2019)Profiling the Usage of an Extreme-Scale Archival Storage System2019 IEEE 27th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)10.1109/MASCOTS.2019.00050(410-422)Online publication date: Oct-2019
  • Show More Cited By



Graham K. Jenkins

I have to confess to turning a bright shade of green when I read this paper. This was because I manage a cluster having about 200 AMD Opteron cores; the authors are from Oak Ridge National Laboratory (ORNL), where they have a system with some 23,000 Opteron cores! It should be noted that the Cray XT4 is an evolutionary descendant of the XT3, which used AMD Opteron 100-series processors, a Cray SeaStar custom interconnect, and a Catamount lightweight kernel operating system (OS). The SeaStar network interface controller (NIC) actually incorporates a PowerPC 440 processor with a direct memory access (DMA) engine onboard, enabling it to provide a sustained bidirectional bandwidth exceeding six gigabytes per second (GB/s). The Catamount OS used in the execute nodes was developed by Sandia National Laboratories to support one thread per node; it was subsequently enhanced to enable the use of dual-core Opterons. The body of the paper details a comparison of performance characteristics for a single-core XT3 cluster, a dual-core XT3 cluster, and a dual-core XT4 cluster. Application performance is the ultimate measure of system capability; insight gained from an analysis of microbenchmark tests can be helpful in optimizing performance. Among the microbenchmarks considered are network latency, matrix transposition (single and message-passing interface (MPI)), and the fast Fourier transform (single and MPI). In each case, the results are presented in colored graphics. Some well-known applications in atmospheric modeling (the community atmosphere model (CAM)), ocean modeling (the parallel ocean program (POP)), biomolecular simulation (nanoscale molecular dynamics (NAMD)), turbulent combustion (direct numerical simulation (DNS)), and fusion (all orders spectral algorithm (AORSA)) are then considered. Again, results are graphically presented. The paper ends with an analysis of results. There are no real surprises here; using both cores in a node can give performance improvements for algorithms that exhibit degrees of temporal locality. If you manage a high-performance computing facility, you'll find this paper invaluable. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.


Information & Contributors


Published In

cover image ACM Conferences
SC '07: Proceedings of the 2007 ACM/IEEE conference on Supercomputing
November 2007
723 pages
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]



Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 November 2007


Request permissions for this article.

Check for updates

Author Tags

  1. AORSA
  2. CAM
  3. Cray XT4
  4. HPCC
  5. IOR
  6. NAMD
  7. POP
  8. S3D


  • Research-article

Funding Sources


SC '07

Acceptance Rates

SC '07 Paper Acceptance Rate 54 of 268 submissions, 20%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference


Other Metrics

Bibliometrics & Citations


Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Feb 2025

Other Metrics


Cited By

View all
  • (2021)An Analysis of System Balance and Architectural Trends Based on Top500 SupercomputersThe International Conference on High Performance Computing in Asia-Pacific Region10.1145/3432261.3432263(11-22)Online publication date: 20-Jan-2021
  • (2021)Architecture and Functionality of the Collective Operations Subnet of the Angara InterconnectDistributed Computer and Communication Networks10.1007/978-3-030-66471-8_17(209-219)Online publication date: 2-Jan-2021
  • (2019)Profiling the Usage of an Extreme-Scale Archival Storage System2019 IEEE 27th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)10.1109/MASCOTS.2019.00050(410-422)Online publication date: Oct-2019
  • (2018)FTRP: a new fault tolerance framework using process replication and prefetching for high-performance computingFrontiers of Information Technology & Electronic Engineering10.1631/FITEE.160145019:10(1273-1290)Online publication date: 28-Nov-2018
  • (2018)Scalable Behavioral Emulation of Extreme-Scale Systems Using Structural Simulation ToolkitProceedings of the 47th International Conference on Parallel Processing10.1145/3225058.3225124(1-11)Online publication date: 13-Aug-2018
  • (2017)Enabling Parallel Simulation of Large-Scale HPC Network SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2016.254372528:1(87-100)Online publication date: 1-Jan-2017
  • (2016)Interconnection Networks in Petascale Computer SystemsACM Computing Surveys10.1145/298338749:3(1-24)Online publication date: 16-Sep-2016
  • (2015)Scalable black-box prediction models for multi-dimensional adaptation on NUMA multi-coresInternational Journal of Parallel, Emergent and Distributed Systems10.1080/17445760.2014.89534630:3(193-210)Online publication date: 1-May-2015
  • (2014)A case study in using massively parallel simulation for extreme-scale torus network codesignProceedings of the 2nd ACM SIGSIM Conference on Principles of Advanced Discrete Simulation10.1145/2601381.2601383(27-38)Online publication date: 18-May-2014
  • (2014)On generating multicast routes for SpiNNakerProceedings of the 11th ACM Conference on Computing Frontiers10.1145/2597917.2597938(1-10)Online publication date: 20-May-2014
  • Show More Cited By

View Options

Login options

View options


View or Download as a PDF file.



View online with eReader.







Share this Publication link

Share on social media