| On using connection-oriented vs. connection-less transport for performance and scalability of collective and one-sided operations: trade-offs and impact |
| Full text |
Pdf
(378 KB)
|
Source
|
Principles and Practice of Parallel Programming
archive
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
table of contents
San Jose, California, USA
SESSION: Communication
table of contents
Pages: 46 - 54
Year of Publication: 2007
ISBN:978-1-59593-602-8
|
|
Authors
|
|
Amith R. Mamidala
|
The Ohio State University, Columbus, OH
|
|
Sundeep Narravula
|
The Ohio State University, Columbus, OH
|
|
Abhinav Vishnu
|
The Ohio State University, Columbus, OH
|
|
Gopal Santhanaraman
|
The Ohio State University, Columbus, OH
|
|
Dhabaleswar K. Panda
|
The Ohio State University, Columbus, OH
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 2, Downloads (12 Months): 68, Citation Count: 1
|
|
|
ABSTRACT
Communication subsystem plays a pivotal role in achieving scalable performance in clusters. The communication semantics employed are dictated by the programming model used by the application such as MPI, UPC, etc. Out of the gamut of communication primitives, collective and one-sided operations are especially significant and have to be designed harnessing the capabilities and features exposed by the underlying networks. In some cases, there is a direct match between the semantics of the operations and the underlying network primitives. InfiniBand provides two transport modes: (i)Connection-oriented Reliable connection (RC) supporting Memory and Channel semantics and (ii) Connection-less Unreliable Datagram (UD) supporting Channel semantics. Achieving good performance and scalability needs careful analysis and design of communication primitives based on these options. In this paper, we evaluate the scalability and performance trade-offs between RC and UD transport modes. We study the semantic advantages of mapping collective and one-sided operations on to memory and channel semantics of InfiniBand(IBA). We take AlltoAll as a case study to demonstrate the benefits of RDMA over Send/Recv and to show the performance/memory trade-offs over IB transports. Our experimental results show that UD-based AlltoAll performs 38% better than Bruck's algorithm for short messages and up to two times better than the direct AlltoAll over RC. Since InfiniBand does not provide RDMA over UD in hardware, we emulate the same in our study. Our results show a performance dip of up to a factor of three for emulated RDMA Read latency as compared to RC, highlighting the need for hardware based RDMA operations over UD.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Rajeev Thakur and William Gropp Improving the Performance of Collective Operations in MPICH In Euro PVM/MPI Conference, 2003.
|
| |
2
|
|
| |
3
|
|
| |
4
|
Network-Based Computing Laboratory MVAPICH: MPI for InfiniBand http://nowlab.cse.ohio-state.edu/projects/mpi-iba/index.html
|
| |
5
|
Abhinav Vishnu and Gopal K. Santhanaraman and Wei Huang and Hyun-Wook Jin and Dhabaleswar K. Panda Supporting MPI-2 One Sided Communication on Multi-Rail InfiniBand Clusters: Design Challenges and Performance Benefits In International Conference on High Performance Computing, HiPC 2005.
|
| |
6
|
UC Berkeley/LBNL Berkeley UPC - Unified Parallel C, http://upc.lbl.gov/
|
| |
7
|
Christian Bell and Dan Bonachea and Rajesh Nishtala and Katherine Yelick Optimizing Bandwidth Limited Problems Using One-Sided Communication and Overlap Support In IEEE International Parallel & Distributed Processing Symposium, IPDPS 2006.
|
| |
8
|
|
| |
9
|
InfiniBand Trade Association InfiniBand Architecture Specification, Release 1.1 November,2002, http://www.infinibandta.org
|
| |
10
|
|
| |
11
|
William Gropp and Ewing Lusk and Nathan Doss and Anthony Skjellum A High-Performance, Portable Implementation of the MPI, Message Passing Interface Standard Argonne National Laboratory and Mississippi State University, 1996.
|
| |
12
|
Rajeev Thakur, Rolf Rabenseifner, and William Gropp. Optimization of Collective Communication Operations in MPICH. In Int'l Journal of High Performance Computing Applications, (19)1:49--66, Spring 2005.
|
| |
13
|
Weihang Jiang, Jiuxing Liu, Hyun-Wook Jin, Dhabaleswar K. Panda, Darius Buntinas, Rajeev Thakur, and William Gropp. Efficient Implementation of MPI-2 Passive One-Sided Communication on InfiniBand Clusters. In Proceedings of the 11th European PVM/MPI Users' Group Meeting (Euro PVM/MPI 2004), Recent Advances in Parallel Virtual Machine and Message Passing Interface, Lecture Notes in Computer Science, LNCS 3241, Springer September 2004, pp. 68--76.
|
| |
14
|
Rajeev Thakur, William Gropp, and Brian Toonen. Minimizing Synchronization Overhead in the Implementation of MPI One-Sided Communication. In Proceedings of the 11th European PVM/MPI Users' Group Meeting (Euro PVM/MPI 2004), Recent Advances in Parallel Virtual Machine and Message Passing Interface, Lecture Notes in Computer Science, LNCS 3241, Springer, September 2004, pp. 57--67.
|
| |
15
|
Weihang Jiang , Jiuxing Liu , Hyun-Wook Jin , D. K. Panda , W. Gropp , R. Thakur, High performance MPI-2 one-sided communication over InfiniBand, Proceedings of the 2004 IEEE International Symposium on Cluster Computing and the Grid, p.531-538, April 19-22, 2004
|
| |
16
|
Amith R. Mamidala, Abhinav Vishnu and Dhabaleswar K. Panda. Efficient Shared Memory and RDMA based design for MPI_Allgather over InfiniBand. In Proceedings of EuroPVM/MPI, September 2006.
|
| |
17
|
|
| |
18
|
|
 |
19
|
|
|