ABSTRACT
The scientific simulation capabilities of next generation high-end computing technology will depend on striking a balance among memory, processor, I/O, and local and global network performance across the breadth of the scientific simulation space. The Cray XT4 combines commodity AMD dual core Opteron processor technology with the second generation of Cray's custom communication accelerator in a system design whose balance is claimed to be driven by the demands of scientific simulation. This paper presents an evaluation of the Cray XT4 using micro-benchmarks to develop a controlled understanding of individual system components, providing the context for analyzing and comprehending the performance of several petascale-ready applications. Results gathered from several strategic application domains are compared with observations on the previous generation Cray XT3 and other high-end computing systems, demonstrating performance improvements across a wide variety of application benchmark problems.
- W. J. Camp and J. L. Tomkins, "Thor's hammer: The first version of the Red Storm MPP architecture," Proceedings of the SC 2002 Conference on High Performance Networking and Computing, Baltimore, MD, November 2002.Google Scholar
- Sandia Red Storm System, http://www.sandia.gov/ASC/redstorm.html.Google Scholar
- S. R. Alam, R. F. Barrett, M. R. Fahey, J. A. Kuehn, O. E. B. Messer, R. T. Mills, P. C. Roth, J. S. Vetter, P. H. Worley, "An Evaluation of the ORNL Cray XT3, "International Journal of High Performance Computing Applications, 2006.Google Scholar
- J. S. Vetter, S. R. Alam, et al., "Early Evaluation of the Cray XT3," Proc. IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2006. Google ScholarDigital Library
- Cray XT3 Data Sheet, http://cray.com/downloads/Cray_XT3_Datasheet.pdf.Google Scholar
- Cray XT4 Data Sheet, http://cray.com/downloads/Cray_XT3_Datasheet.pdf.Google Scholar
- M. Snir, W. D. Gropp, et al., Eds., MPI-the complete reference (2-volume set), 2nd ed., Cambridge, MA, MIT Press, 1998.Google Scholar
- M. Fahey, "XT Parallel IO," NCCS Users Meeting, Oak Ridge National Laboratory, TN, 2007.Google Scholar
- P. Luszczek, J. Dongarra, D. Koester, R. Rabenseifner, B. Lucas, J. Kepner, J. McCalpin, D. Bailey, D. Takahashi, "Introduction to the HPC Challenge Benchmark Suite," March, 2005.Google Scholar
- J. Dongarra, P. Luszczek, "Introduction to the HPC Challenge Benchmark Suite," ICL Technical Report, ICL-UT-05-01, (Also appears as CS Dept. Tech Report UT-CS-05-544), 2005.Google Scholar
- P. Luszczek, D. Koester, "HPC Challenge vl.x Benchmark Suite," SC\05 Tutorial-S13, Seattle, WA, November 13, 2005.Google Scholar
- High Performance Computing Challenge Benchmark Suite Website, http://icl.cs.utk.edu/hpcc/.Google Scholar
- Custom Fortran/MPI code to test I/, G. Wagenbreth, 2007.Google Scholar
- IOR Benchmark, ftp://ftp.llnl.gov/pub/siop/ior/.Google Scholar
- Custom Fortran/MPI code to test I/O, M. Fahey, 2007.Google Scholar
- W. D. Collins, P. J. Rasch, "Description of the NCAR Community Atmosphere Model (CAM 3.0)," National Center for Atmospheric Research, Boulder, CO 80307, 2004.Google Scholar
- W. D. Collins, P. J. Rasch, et al., "The Formulation and Atmospheric Simulation of the Community Atmosphere Model: CAM3," Journal of Climate, to appear, 2006.Google ScholarCross Ref
- Community Climate System Model, http://www.ccsm.ucar.edu/.Google Scholar
- M. B. Blackmon, B. Boville, et al., "The Community Climate System Model," BAMS, 82(11):2357--76, 2001.Google ScholarCross Ref
- P. H. Worley, "CAM Performance on the X1E and XT3," Proc. Proceedings of the 48th Cray User Group Conference, May 8--11, 2006.Google Scholar
- L. Dagum, R. Menon, "OpenMP: An Industry-Standard API for Shared-Memory Programming," IEEE Computational Science & Engineering, 5(1):46--55, 1998. Google ScholarDigital Library
- J. T. Kiehl, J. J. Hack, et al., "The National Center for Atmospheric Research Community Climate Model: CCM3," Journal of Climate, 11:1131--49, 1998.Google ScholarCross Ref
- D. L. Williamson, J. G. Olson, "Climate simulations with a semi-Lagrangian version of the NCAR Community Climate Model," MWR, 122:1594--610, 1994.Google ScholarCross Ref
- S. J. Lin, "A vertically Lagrangian finite-volume dynamical core for global models," MWR, 132(10):2293--307, 2004.Google ScholarCross Ref
- P. W. Jones, P. H. Worley, et al., "Practical performance portability in the Parallel Ocean Program (POP)," Concurrency and Computation: Experience and Practice (in press), 2004. Google ScholarDigital Library
- P. H. Worley, J. Levesque, "The Performance Evolution of the Parallel Ocean Program on the Cray XI," Proceedings of the 46th Cray User Group Conference, Knoxville, TN, May 17--21, 2004.Google Scholar
- R. W. Numrich, J. K. Reid, "Co-Array Fortran for parallel programming," ACM Fortran Forum, 17(2):1--31, 1998. Google ScholarDigital Library
- A. Chronopoulos, C. Gear, "s-step iterative methods for symmetric linear systems," J. Comput. Appl. Math, 25:153--168, 1989. Google ScholarDigital Library
- J. C. Phillips, R. Braun, W. Wang, J. Gumbart, E. Tajkhorshid, E. Villa, C. Chipot, R. D. Skeel, L. Kale, K. Schulten, "Scalable molecular dynamics with NAMD," Journal of Computational Chemistry, 26:1781--1802, 2005.Google ScholarCross Ref
- S. Kumar, G. Almasi, C. Huang, L. V. Kale, "Achieving Strong Scaling with NAMD on Blue Gene/L," presented at IEEE International Parallel & Distributed Processing Symposium, Rhodes Island, Greece, 2006. Google ScholarDigital Library
- J. H. Chen, E. R Hawkes, R. Sankaran, et al., "Direct numerical simulation of ignition front propagation in a constant volume with temperature inhomogeneities I. fundamental analysis and diagnostics," Combustion and Flame, 145:128--144, 2006.Google ScholarCross Ref
- R. Sankaran, E. R. Hawkes, J. H. Chen, et al., "Structure of a spatially developing turbulent lean methane-air Bunsen flame," Proceedings of the Combustion Institute, 31:1291--1298, 2007.Google ScholarCross Ref
- E. R. Hawkes, R. Sankaran, J. C. Sutherland, et al., "Scalar mixing in direct numerical simulations of temporally evolving nonpremixed plane jet flames with skeletal CO-H2 kinetics," Proceedings of the Combustion Institute, 31:1633--1640, 2007.Google ScholarCross Ref
- C. A. Kennedy, M. H. Carpenter, R. M. Lewis, "Low-storage explicit Runge-Kutta schemes for the compressible Navier-Stokes equations," Applied Numerical Mathematics, 35(3):177--264, 2000. Google ScholarDigital Library
- E. F. Jaeger, L. A. Berry, et al., "All-orders spectral calculation of radio frequency heating in two-dimensional toroidal plasmas," Phys. Plasmas, 8(1573), 2001.Google Scholar
- HPL - A Portable Implementation of the High-Performance Linpack Benchmark for Distributed-Memory Computers, http://www.netlib.org/benchmark/hpl.Google Scholar
- Private communication, Eduardo D'Azevedo, Oak Ridge National Laboratory, 2007.Google Scholar
- K. Goto, R. A. van de Geijn, "Anatomy of High-Performance Matrix Multiplication," ACM Transactions on Mathematical Software, accepted pending modifications. Google ScholarDigital Library
Index Terms
- Cray XT4: an early evaluation for petascale scientific simulation
Recommendations
Performance evaluation of supercomputers using HPCC and IMB Benchmarks
The HPC Challenge (HPCC) Benchmark suite and the Intel MPI Benchmark (IMB) are used to compare and evaluate the combined performance of processor, memory subsystem and interconnect fabric of five leading supercomputers-SGI Altix BX2, Cray X1, Cray ...
What You Should Know About NAMD and Charm++ But Were Hoping to Ignore
PEARC '18: Proceedings of the Practice and Experience on Advanced Research ComputingThe biomolecular simulation program NAMD is used heavily at many HPC centers. Supporting NAMD users requires knowledge of the Charm++ parallel runtime system on which NAMD is built. Introduced in 1993, Charm++ supports message-driven, task-based, and ...
Application Performance on the Tri-Lab Linux Capacity Cluster-TLCC
In a recent acquisition by DOE/NNSA several large capacity computing clusters called TLCC have been installed at the DOE labs: SNL, LANL and LLNL. TLCC architecture with ccNUMA, multi-socket, multi-core nodes, and InfiniBand interconnect, is ...
Comments