skip to main content
10.1109/SC.2005.11acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections

An Application-Based Performance Characterization of the Columbia Supercluster

Published: 12 November 2005 Publication History


Columbia is a 10,240-processor supercluster consisting of 20 Altix nodes with 512 processors each, and currently ranked as one of the fastest computers in the world. In this paper, we present the performance characteristics of Columbia obtained on up to four computing nodes interconnected via the InfiniBand and/or NUMAlink4 communication fabrics. We evaluate floatingpoint performance, memory bandwidth, message passing communication speeds, and compilers using a subset of the HPC Challenge benchmarks, and some of the NAS Parallel Benchmarks including the multi-zone versions. We present detailed performance results for three scientific applications of interest to NASA, one from molecular dynamics, and two from computational fluid dynamics. Our results show that both the NUMAlink4 and In- finiBand interconnects hold promise for multi-node application scaling to at least 2048 processors.


{1} D. Bailey, J. Barton, T. Lasinski, and H. S. (Eds.). The NAS Parallel Benchmarks. Technical Report NAS-91- 002, NASA Ames Research Center, Moffett Field, CA, 1991.
{2} D. Bailey, T. Harris, W. Saphir, R. Van der Wijngaart, A. Woo, and M. Yarrow. The NAS Parallel Benchmarks 2.0. Technical Report NAS-95-020, NASA Ames Research Center, Moffett Field, CA, 1995.
{3} J. Borrill, J. Carter, L. Oliker, D. Skinner, and R. Biswas. Integrated performance monitoring of a cosmology application on leading hec platforms. In Proc. 34th International Conference on Parallel Processing, pages 119- 128, Oslo, Norway, June 2005.
{4} P. G. Buning, D. C. Jespersen, T. H. Pulliam, W. M. Chan, J. P. S. amd S. E. Krist, and K. J. Renze. Overflow user's manual, version 1.8g. Technical report, NASA Langley Research Center, Hampton, VA, 1999.
{5} M. J. Djomehri and R. Biswas. Performance analysis of a hybrid overset multi-block application on multiple architectures. In Proc. High Performance Computing - HiPC 2003, 10th International Conference, Hyderabad, India, December 2003.
{6} M. J. Djomehri, R. Biswas, and N. Lopez-Benitez. Load balancing strategies for multi-block overset grid applications. In Proc. 18th International Conference on Computers and Their Applications, pages 373-378, Honolulu, HI, March 2003.
{7} Effective Bandwidth Benchmark. mpi/b_eff/.
{8} HPC Challenge Benchmarks.
{9} InfiniBand Specifications.
{10} H. Jin and R. Van der Wijngaart. Performance characteristics of the multi-zone NAS Parallel Benchmarks. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS 2004), Santa Fe, NM, April 2004.
{11} C. Kiris, D. Kwak, and W. Chan. Parallel unsteady turbopump simulations for liquid rocket engines. In Supercomputing 2000, November 2000.
{12} C. Kiris, D. Kwak, and S. Rogers. Incompressible Navier-Stokes solvers in primitive variables and their applications to steady and unsteady flow simulations. In M. Hafez, editor, Numerical Simulations of Incompressible Flows. World Scientific, 2003.
{13} J. Liu, B. Chandrasekaran, J. Wu, W. Jiang, S. Kini, W. Yu, D. Buntinas, P. Wyckoff, and D. Panda. Performance comparison of MPI implementations over Inifin-Band, Myrinet, and Quadrics. In Proceedings of SC'03, Phoenix, AZ, November 2003.
{14} R. Meakin and A. M. Wissink. Unsteady aerodynamic simulation of static and moving bodies using scalable computers. In Proc. 14th AIAA Computational Fluid Dynamics Conference, Paper number 99-3302, Norfolk, VA, 1999.
{15} NAS Parallel Benchmarks.
{16} D. C. Rapport. The Art of Molecular Dynamics Simulation . Cambridge University Press, 1995.
{17} R. Strawn and M. Djomehri. Computational modeling of hovering rotor and wake aerodynamics. Journal of Aircraft, 39(5): 786-793, 2002.
{18} J. R. Taft. Achieving 60 gflop/s on the production cfd code overflow-mlp. Parallel Computing, 27(4): 521-536, 2001.
{19} Top500 Supercomputer Sites.
{20} Voltaire ISR 9288 InfiniBand switch router.

Cited By

View all
  • (2017)Performance modeling and optimization of parallel LU-SGS on many-core processors for 3D high-order CFD simulationsThe Journal of Supercomputing10.1007/s11227-016-1943-073:6(2506-2524)Online publication date: 1-Jun-2017
  • (2010)Exploiting 162-Nanosecond End-to-End Communication Latency on AntonProceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC.2010.23(1-12)Online publication date: 13-Nov-2010
  • (2008)Scientific application-based performance comparison of SGI Altix 4700, IBM POWER5+, and SGI ICE 8200 supercomputersProceedings of the 2008 ACM/IEEE conference on Supercomputing10.5555/1413370.1413378(1-12)Online publication date: 15-Nov-2008
  • Show More Cited By



Information & Contributors


Published In

cover image ACM Conferences
SC '05: Proceedings of the 2005 ACM/IEEE conference on Supercomputing
November 2005
829 pages



IEEE Computer Society

United States

Publication History

Published: 12 November 2005

Check for updates

Author Tags

  1. HPC Challenge benchmarks
  2. NAS Parallel Benchmarks
  3. SGI Altix
  4. computational fluid dynamics
  5. molecular dynamics
  6. multi-block overset grids
  7. multi-level parallelism


  • Article


SC '05

Acceptance Rates

SC '05 Paper Acceptance Rate 62 of 260 submissions, 24%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference


Other Metrics

Bibliometrics & Citations


Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics


Cited By

View all
  • (2017)Performance modeling and optimization of parallel LU-SGS on many-core processors for 3D high-order CFD simulationsThe Journal of Supercomputing10.1007/s11227-016-1943-073:6(2506-2524)Online publication date: 1-Jun-2017
  • (2010)Exploiting 162-Nanosecond End-to-End Communication Latency on AntonProceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC.2010.23(1-12)Online publication date: 13-Nov-2010
  • (2008)Scientific application-based performance comparison of SGI Altix 4700, IBM POWER5+, and SGI ICE 8200 supercomputersProceedings of the 2008 ACM/IEEE conference on Supercomputing10.5555/1413370.1413378(1-12)Online publication date: 15-Nov-2008
  • (2008)Benchmarking the Columbia SuperclusterInternational Journal of High Performance Computing Applications10.5555/1340941.134094622:1(97-112)Online publication date: 1-Feb-2008
  • (2008)Performance evaluation of a multi-zone application in different OpenMP approachesInternational Journal of Parallel Programming10.1007/s10766-008-0074-536:3(312-325)Online publication date: 1-Jun-2008
  • (2007)High Resolution Aerospace Applications Using the NASA Columbia SupercomputerInternational Journal of High Performance Computing Applications10.1177/109434200607487221:1(106-126)Online publication date: 1-Feb-2007
  • (2006)Interconnect performance evaluation of SGI altix 3700 BX2, cray X1, cray opteron cluster, and dell PowerEdgeProceedings of the 20th international conference on Parallel and distributed processing10.5555/1898699.1898858(323-323)Online publication date: 25-Apr-2006
  • (2005)High Resolution Aerospace Applications using the NASA Columbia SupercomputerProceedings of the 2005 ACM/IEEE conference on Supercomputing10.1109/SC.2005.32Online publication date: 12-Nov-2005
  • (2005)Impact of the columbia supercomputer on NASA science and engineering applicationsProceedings of the 7th international conference on Distributed Computing10.1007/11603771_33(293-305)Online publication date: 27-Dec-2005

View Options

Login options

View options


View or Download as a PDF file.



View online with eReader.







Share this Publication link

Share on social media