skip to main content
10.1145/1362622.1362675acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Cray XT4: an early evaluation for petascale scientific simulation

Published:10 November 2007Publication History

ABSTRACT

The scientific simulation capabilities of next generation high-end computing technology will depend on striking a balance among memory, processor, I/O, and local and global network performance across the breadth of the scientific simulation space. The Cray XT4 combines commodity AMD dual core Opteron processor technology with the second generation of Cray's custom communication accelerator in a system design whose balance is claimed to be driven by the demands of scientific simulation. This paper presents an evaluation of the Cray XT4 using micro-benchmarks to develop a controlled understanding of individual system components, providing the context for analyzing and comprehending the performance of several petascale-ready applications. Results gathered from several strategic application domains are compared with observations on the previous generation Cray XT3 and other high-end computing systems, demonstrating performance improvements across a wide variety of application benchmark problems.

References

  1. W. J. Camp and J. L. Tomkins, "Thor's hammer: The first version of the Red Storm MPP architecture," Proceedings of the SC 2002 Conference on High Performance Networking and Computing, Baltimore, MD, November 2002.Google ScholarGoogle Scholar
  2. Sandia Red Storm System, http://www.sandia.gov/ASC/redstorm.html.Google ScholarGoogle Scholar
  3. S. R. Alam, R. F. Barrett, M. R. Fahey, J. A. Kuehn, O. E. B. Messer, R. T. Mills, P. C. Roth, J. S. Vetter, P. H. Worley, "An Evaluation of the ORNL Cray XT3, "International Journal of High Performance Computing Applications, 2006.Google ScholarGoogle Scholar
  4. J. S. Vetter, S. R. Alam, et al., "Early Evaluation of the Cray XT3," Proc. IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Cray XT3 Data Sheet, http://cray.com/downloads/Cray_XT3_Datasheet.pdf.Google ScholarGoogle Scholar
  6. Cray XT4 Data Sheet, http://cray.com/downloads/Cray_XT3_Datasheet.pdf.Google ScholarGoogle Scholar
  7. M. Snir, W. D. Gropp, et al., Eds., MPI-the complete reference (2-volume set), 2nd ed., Cambridge, MA, MIT Press, 1998.Google ScholarGoogle Scholar
  8. M. Fahey, "XT Parallel IO," NCCS Users Meeting, Oak Ridge National Laboratory, TN, 2007.Google ScholarGoogle Scholar
  9. P. Luszczek, J. Dongarra, D. Koester, R. Rabenseifner, B. Lucas, J. Kepner, J. McCalpin, D. Bailey, D. Takahashi, "Introduction to the HPC Challenge Benchmark Suite," March, 2005.Google ScholarGoogle Scholar
  10. J. Dongarra, P. Luszczek, "Introduction to the HPC Challenge Benchmark Suite," ICL Technical Report, ICL-UT-05-01, (Also appears as CS Dept. Tech Report UT-CS-05-544), 2005.Google ScholarGoogle Scholar
  11. P. Luszczek, D. Koester, "HPC Challenge vl.x Benchmark Suite," SC\05 Tutorial-S13, Seattle, WA, November 13, 2005.Google ScholarGoogle Scholar
  12. High Performance Computing Challenge Benchmark Suite Website, http://icl.cs.utk.edu/hpcc/.Google ScholarGoogle Scholar
  13. Custom Fortran/MPI code to test I/, G. Wagenbreth, 2007.Google ScholarGoogle Scholar
  14. IOR Benchmark, ftp://ftp.llnl.gov/pub/siop/ior/.Google ScholarGoogle Scholar
  15. Custom Fortran/MPI code to test I/O, M. Fahey, 2007.Google ScholarGoogle Scholar
  16. W. D. Collins, P. J. Rasch, "Description of the NCAR Community Atmosphere Model (CAM 3.0)," National Center for Atmospheric Research, Boulder, CO 80307, 2004.Google ScholarGoogle Scholar
  17. W. D. Collins, P. J. Rasch, et al., "The Formulation and Atmospheric Simulation of the Community Atmosphere Model: CAM3," Journal of Climate, to appear, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  18. Community Climate System Model, http://www.ccsm.ucar.edu/.Google ScholarGoogle Scholar
  19. M. B. Blackmon, B. Boville, et al., "The Community Climate System Model," BAMS, 82(11):2357--76, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  20. P. H. Worley, "CAM Performance on the X1E and XT3," Proc. Proceedings of the 48th Cray User Group Conference, May 8--11, 2006.Google ScholarGoogle Scholar
  21. L. Dagum, R. Menon, "OpenMP: An Industry-Standard API for Shared-Memory Programming," IEEE Computational Science & Engineering, 5(1):46--55, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. T. Kiehl, J. J. Hack, et al., "The National Center for Atmospheric Research Community Climate Model: CCM3," Journal of Climate, 11:1131--49, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  23. D. L. Williamson, J. G. Olson, "Climate simulations with a semi-Lagrangian version of the NCAR Community Climate Model," MWR, 122:1594--610, 1994.Google ScholarGoogle ScholarCross RefCross Ref
  24. S. J. Lin, "A vertically Lagrangian finite-volume dynamical core for global models," MWR, 132(10):2293--307, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  25. P. W. Jones, P. H. Worley, et al., "Practical performance portability in the Parallel Ocean Program (POP)," Concurrency and Computation: Experience and Practice (in press), 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. P. H. Worley, J. Levesque, "The Performance Evolution of the Parallel Ocean Program on the Cray XI," Proceedings of the 46th Cray User Group Conference, Knoxville, TN, May 17--21, 2004.Google ScholarGoogle Scholar
  27. R. W. Numrich, J. K. Reid, "Co-Array Fortran for parallel programming," ACM Fortran Forum, 17(2):1--31, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Chronopoulos, C. Gear, "s-step iterative methods for symmetric linear systems," J. Comput. Appl. Math, 25:153--168, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. C. Phillips, R. Braun, W. Wang, J. Gumbart, E. Tajkhorshid, E. Villa, C. Chipot, R. D. Skeel, L. Kale, K. Schulten, "Scalable molecular dynamics with NAMD," Journal of Computational Chemistry, 26:1781--1802, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  30. S. Kumar, G. Almasi, C. Huang, L. V. Kale, "Achieving Strong Scaling with NAMD on Blue Gene/L," presented at IEEE International Parallel & Distributed Processing Symposium, Rhodes Island, Greece, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. H. Chen, E. R Hawkes, R. Sankaran, et al., "Direct numerical simulation of ignition front propagation in a constant volume with temperature inhomogeneities I. fundamental analysis and diagnostics," Combustion and Flame, 145:128--144, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  32. R. Sankaran, E. R. Hawkes, J. H. Chen, et al., "Structure of a spatially developing turbulent lean methane-air Bunsen flame," Proceedings of the Combustion Institute, 31:1291--1298, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  33. E. R. Hawkes, R. Sankaran, J. C. Sutherland, et al., "Scalar mixing in direct numerical simulations of temporally evolving nonpremixed plane jet flames with skeletal CO-H2 kinetics," Proceedings of the Combustion Institute, 31:1633--1640, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  34. C. A. Kennedy, M. H. Carpenter, R. M. Lewis, "Low-storage explicit Runge-Kutta schemes for the compressible Navier-Stokes equations," Applied Numerical Mathematics, 35(3):177--264, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. E. F. Jaeger, L. A. Berry, et al., "All-orders spectral calculation of radio frequency heating in two-dimensional toroidal plasmas," Phys. Plasmas, 8(1573), 2001.Google ScholarGoogle Scholar
  36. HPL - A Portable Implementation of the High-Performance Linpack Benchmark for Distributed-Memory Computers, http://www.netlib.org/benchmark/hpl.Google ScholarGoogle Scholar
  37. Private communication, Eduardo D'Azevedo, Oak Ridge National Laboratory, 2007.Google ScholarGoogle Scholar
  38. K. Goto, R. A. van de Geijn, "Anatomy of High-Performance Matrix Multiplication," ACM Transactions on Mathematical Software, accepted pending modifications. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Cray XT4: an early evaluation for petascale scientific simulation

                  Recommendations

                  Reviews

                  Graham K. Jenkins

                  I have to confess to turning a bright shade of green when I read this paper. This was because I manage a cluster having about 200 AMD Opteron cores; the authors are from Oak Ridge National Laboratory (ORNL), where they have a system with some 23,000 Opteron cores! It should be noted that the Cray XT4 is an evolutionary descendant of the XT3, which used AMD Opteron 100-series processors, a Cray SeaStar custom interconnect, and a Catamount lightweight kernel operating system (OS). The SeaStar network interface controller (NIC) actually incorporates a PowerPC 440 processor with a direct memory access (DMA) engine onboard, enabling it to provide a sustained bidirectional bandwidth exceeding six gigabytes per second (GB/s). The Catamount OS used in the execute nodes was developed by Sandia National Laboratories to support one thread per node; it was subsequently enhanced to enable the use of dual-core Opterons. The body of the paper details a comparison of performance characteristics for a single-core XT3 cluster, a dual-core XT3 cluster, and a dual-core XT4 cluster. Application performance is the ultimate measure of system capability; insight gained from an analysis of microbenchmark tests can be helpful in optimizing performance. Among the microbenchmarks considered are network latency, matrix transposition (single and message-passing interface (MPI)), and the fast Fourier transform (single and MPI). In each case, the results are presented in colored graphics. Some well-known applications in atmospheric modeling (the community atmosphere model (CAM)), ocean modeling (the parallel ocean program (POP)), biomolecular simulation (nanoscale molecular dynamics (NAMD)), turbulent combustion (direct numerical simulation (DNS)), and fusion (all orders spectral algorithm (AORSA)) are then considered. Again, results are graphically presented. The paper ends with an analysis of results. There are no real surprises here; using both cores in a node can give performance improvements for algorithms that exhibit degrees of temporal locality. If you manage a high-performance computing facility, you'll find this paper invaluable. Online Computing Reviews Service

                  Access critical reviews of Computing literature here

                  Become a reviewer for Computing Reviews.

                  Comments

                  Login options

                  Check if you have access through your login credentials or your institution to get full access on this article.

                  Sign in
                  • Published in

                    cover image ACM Conferences
                    SC '07: Proceedings of the 2007 ACM/IEEE conference on Supercomputing
                    November 2007
                    723 pages
                    ISBN:9781595937643
                    DOI:10.1145/1362622

                    Copyright © 2007 ACM

                    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                    Publisher

                    Association for Computing Machinery

                    New York, NY, United States

                    Publication History

                    • Published: 10 November 2007

                    Permissions

                    Request permissions about this article.

                    Request Permissions

                    Check for updates

                    Qualifiers

                    • research-article

                    Acceptance Rates

                    SC '07 Paper Acceptance Rate54of268submissions,20%Overall Acceptance Rate1,516of6,373submissions,24%

                  PDF Format

                  View or Download as a PDF file.

                  PDF

                  eReader

                  View online with eReader.

                  eReader