skip to main content
10.1145/1362622.1362671acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Evaluating NIC hardware requirements to achieve high message rate PGAS support on multi-core processors

Published: 10 November 2007 Publication History

Abstract

Partitioned global address space (PGAS) programming models have been identified as one of the few viable approaches for dealing with emerging many-core systems. These models tend to generate many small messages, which requires specific support from the network interface hardware to enable efficient execution. In the past, Cray included E-registers on the Cray T3E to support the SHMEM API; however, with the advent of multi-core processors, the balance of computation to communication capabilities has shifted toward computation. This paper explores the message rates that are achievable with multi-core processors and simplified PGAS support on a more conventional network interface. For message rate tests, we find that simple network interface hardware is more than sufficient. We also find that even typical data distributions, such as cyclic or block-cyclic, do not need specialized hardware support. Finally, we assess the impact of such support on the well known RandomAccess benchmark.

References

[1]
E. Anderson, J. Brooks, C. Grassl, and S. Scott. Performance of the Cray T3E multiprocessor. In 1997 ACM/IEE Supercomputing Conference (SC '97), November 1997.
[2]
M. Blumrich, K. Li, R. Alpert, C. Dubnicki, and E. Felten. Virtual memory mapped network interface for the SHRIMP multicomputer. In 21st Annual International Symposium on Computer Architecture, pages 142--153, Chicago, Illinois, USA, Apr. 1994.
[3]
N. J. Boden, D. Cohen, R. E. E. A. E. Kulawik, C. L. Seitz, J. N. Seizovic, and W.-K. Su. Myrinet: A gigabit-per-second local area network. IEEE Micro, 15(1):29--36, Feb. 1995.
[4]
D. Bonachea. Gasnet specification, vl.1. Technical Report UCB/CSD-02-1207, October 2002.
[5]
R. Brightwell, D. Doerfler, and K. D. Underwood. A preliminary analysis of the InfiniPath and XDI network interfaces. In 20th International Parallel and Distributed Processing Symposium (IPDPS '06) Workshop on Communication Architectures for Clusters, April 2006.
[6]
D. Burger and T. Austin. The SimpleScalar Tool Set. Version 2.0. SimpleScalar LLC.
[7]
D. Callahan, B. L. Chamberlain, and H. P. Zima. The Cascade high productivity language. In Ninth IEEE International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS 2004), pages 52--60, April 2004.
[8]
J. Carbonaro and F. Verhoorn. Cavallino: The Teraflops router and NIC. In Fourth IEEE Symposium on High-Performance Interconnects (Hotl '96), August 1996.
[9]
W. W. Carlson, J. M. Draper, D. E. Culler, K. Yelick, E. Brooks, and K. Warren. Introduction to UPC and language specification. Technical Report CCS-TR-99-157, May 1999.
[10]
P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar. X10: An object-oriented approach to non-uniform cluster computing. In Twentieth ACM SIGPLAN Conference on Object Oriented Programming, Systems, Languages, and Applications, pages 519--538, October 2005.
[11]
Cray, Inc. Cray XIE supercomputer. http://www.cray.com/products/systems/xi.
[12]
Cray Research, Inc. SHMEM Technical Note for C, SG-2516 2.3, October 1994.
[13]
L. Dickman, G. Lindahl, D. Olson, J. Rubin, and J. Broughton. PathScale InfiniPath: A first look. In Proceedings of the 13th Symposium on High Performance Interconnects (HOTI '05), August 2005.
[14]
R. Garg and Y. Sabharwal. Software routing and aggregation of messages to optimize the performance of the HPCC Randomaccess benchmark. In 2006 ACM/IEEE International Conference for High-Performance Computing, Networking. Storage, and Analysis (SC '06), November 2006.
[15]
H. Hellwagner and A. Reinefeld, editors.SCI: Scalable Coherent Interface: Architecture andxo Software for High-Performance Compute Clusters, volume 1734 of Lecture Notes in Computer Science. Springer, 1999.
[16]
Infiniband Trade Association. http://www.infinibandta.org, 1999.
[17]
S. M. Kelly and R. Brightwell. Software architecture of the light weight kernel, Catamount. In Proceedings of the 2005 Cray User Group Annual Technical Conference, May 2005.
[18]
J. Liu and D. K. Panda. Implementing efficient and scalable flow control schemes in MPI over InfiniBand. In 2004 Workshop on Communication Architecture for Clusters (CAC '04), April 2004.
[19]
P. Luszczek, J. Dongarra, D. Koester, R. Rabenseifner, R. Lucas, J. Kepner, J. McCalpin, D. Bailey, and D. Takahashi. Introduction to the HPC challenge benchmark suite, March 2005. http://icl.cs.utk.edu/hpcc/pubs/index.html.
[20]
D. Mayhew and V. Krishnan. PCI Express and Advanced Switching: Evolutionary path to building next generation interconnects. In Eleventh IEEE Symposium on High-Performance Interconnects (Hotl '04), August 2004.
[21]
Mellanox, Inc. New Mellanox ConnectX IB adapters unleash multi-core processor performance, http://www.mellanox.com/news/press_releases/pr_032607.php.
[22]
J. Nieplocha and B. Carpenter. ARMCI: A Remote Memory Copy Library for Distributed Array Libraries and Compiler Run-Time Systems, volume 1586, pages 533--546. Springer, 1999.
[23]
J. Nieplocha and R. Harrison. Shared-memory programming in metacomputing environments: The Global Array approach. The Journal of Supercomputing, 11:119--136, 1997.
[24]
R. W. Numrich and J. Reid. Co-array Fortran for parallel programming. ACM SIGPLAN Fortran Forum, 17(2):1--31, August 1998.
[25]
F. Petrini, W. chun Feng, A. Hoisie, S. Coll, and E. Frachtenberg. The Quadrics network: High-performance clustering technology. IEEE Micro, 22(1):46--57, January/February 2002.
[26]
S. Plimpton, R. Brightwell, C. Vaughan, K. Underwood, and M. Davis. A simple synchronous distributed-memory algorithm for the HPCC RandomAccess benchmark. In 2006 IEEE International Conference on Cluster Computing, September 2006.
[27]
QLogic, Inc. InfiniPath interconnect performance. http://www.pathscale.com/infinipath-perf.html.
[28]
Quadrics, Inc. QSNet-II performance results. http://www.quadrics.com/.
[29]
D. Roweth and A. Pittman. Optimised global reduction on QsNet-II. In Thirteenth IEEE Symposium on High-Performance Interconnects (Hotl '05), August 2005.
[30]
S. L. Scott. Synchronization and communication in the T3E multiprocessor. In Seventh ACM International Conference on Architectural Support for Programming Languages and Operating Systems, October 1996.
[31]
G. L. Steele, Jr. Parallel programming and code selection in Fortress. In Eleventh ACM Symposium on Principles and Practice of Parallel Programming, March 2006.
[32]
K. Underwood. Challenges and issues in benchmarking MPI. In B. Mohr, J. L. Träff, J. Worringen, and J. Dongarra, editors, Recent Advances in Parallel Virtual Machine and Message Passing Interface: 13th European PVM/MPI Users' Group Meeting, Bonn, Germany, September 2006 Proceedings, volume 4192 of Lecture Notes in Computer Science, pages 339--346. Springer-Verlag, 2006.
[33]
K. D. Underwood, M. Levenhagen, and A. Rodrigues. Simulating Red Storm: Challenges and successes in building a system simulation. In 21st International Parallel and Distributed Processing Symposium (IPDPS '07), March 2007.
[34]
T. von Eicken, D. E. Culler, S. C. Goldstein, and K. E. Schauser. Active messages: a mechanism for integrated communication and computation. In Proceedings of the 19th annual International Symposium on Computer Architecture, pages 256--266, May 1992.

Cited By

View all
  • (2014)Enabling communication concurrency through flexible MPI endpointsThe International Journal of High Performance Computing Applications10.1177/109434201454877228:4(390-405)Online publication date: 23-Sep-2014
  • (2014)Reuse Distance Based Circuit Replacement in Silicon Photonic Interconnection Networks for HPCProceedings of the 2014 IEEE 22nd Annual Symposium on High-Performance Interconnects10.1109/HOTI.2014.20(49-56)Online publication date: 26-Aug-2014
  • (2013)On achieving high message ratesProceedings of the 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing10.1109/CCGrid.2013.43(498-505)Online publication date: 13-May-2013
  • Show More Cited By
  1. Evaluating NIC hardware requirements to achieve high message rate PGAS support on multi-core processors

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SC '07: Proceedings of the 2007 ACM/IEEE conference on Supercomputing
    November 2007
    723 pages
    ISBN:9781595937643
    DOI:10.1145/1362622
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 November 2007

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article

    Conference

    SC '07
    Sponsor:

    Acceptance Rates

    SC '07 Paper Acceptance Rate 54 of 268 submissions, 20%;
    Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 10 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2014)Enabling communication concurrency through flexible MPI endpointsThe International Journal of High Performance Computing Applications10.1177/109434201454877228:4(390-405)Online publication date: 23-Sep-2014
    • (2014)Reuse Distance Based Circuit Replacement in Silicon Photonic Interconnection Networks for HPCProceedings of the 2014 IEEE 22nd Annual Symposium on High-Performance Interconnects10.1109/HOTI.2014.20(49-56)Online publication date: 26-Aug-2014
    • (2013)On achieving high message ratesProceedings of the 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing10.1109/CCGrid.2013.43(498-505)Online publication date: 13-May-2013
    • (2012)A preliminary evaluation of the hardware acceleration of the Cray Gemini interconnect for PGAS languages and comparison with MPIACM SIGMETRICS Performance Evaluation Review10.1145/2381056.238107740:2(92-98)Online publication date: 8-Oct-2012
    • (2011)Experiences with UPC on TILE-64 processorProceedings of the 2011 IEEE Aerospace Conference10.1109/AERO.2011.5747452(1-9)Online publication date: 5-Mar-2011
    • (2010)Hybrid PGAS runtime support for multicore nodesProceedings of the Fourth Conference on Partitioned Global Address Space Programming Model10.1145/2020373.2020376(1-10)Online publication date: 12-Oct-2010
    • (2009)From Silicon to ScienceACM Transactions on Reconfigurable Technology and Systems10.1145/1575779.15757862:4(1-15)Online publication date: 1-Sep-2009
    • (2009)A Resource Optimized Remote-Memory-Access Architecture for Low-latency CommunicationProceedings of the 2009 International Conference on Parallel Processing10.1109/ICPP.2009.62(220-227)Online publication date: 22-Sep-2009
    • (2008)Runtime optimization of vector operations on large scale SMP clustersProceedings of the 17th international conference on Parallel architectures and compilation techniques10.1145/1454115.1454134(122-132)Online publication date: 25-Oct-2008
    • (2008)HPPNET: A novel network for HPC and its implication for communication software2008 IEEE International Symposium on Parallel and Distributed Processing10.1109/IPDPS.2008.4536146(1-8)Online publication date: Apr-2008
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media