|
ABSTRACT
The paper proposes a novel approach for optimizing performance of all-to-all collective communication by taking advantage of concurrency available in modern networks such as Infiniband or Quadrics. Using the MPI AllGather operation as an example, we describe how network concurrency can be exploited in an optimized implementation of this operation. For example, compared to leading MPI implementations for a 32-KB message on 128 processors, our new algorithm yields a 65% improvement on the Infiniband at Virginia Tech and an 89% improvement on the Quadrics cluster at Pacific Northwest National Laboratory.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
[1] "Elan4 NIC Diagram." http://www.quadrics.com/Quadrics/QuadricsHome. nsf/DisplayPages/79CEA2573DC8909880256D88004A5C0 1/$File/QuadricsHotChipsOld1.ppt
|
| |
2
|
[2] "Infinihost NIC Diagram." http://www.hd3d.com/diagram02.html
|
| |
3
|
[3] J. Beecroft, D. Addison, F. Petrini, and M. McLaren, "QsNetII: An Interconnect for Supercomputing Applications," Quadrics. http://doc.quadrics.com/Quadrics/QuadricsHome.nsf/Di splayPages/7B383554432E5F4D80256EAD0010AA83/$File/Qs Net+Hot+chips+paper1.pdf
|
| |
4
|
[4] N. Arnold, K. Thilo, E. B. Henri, and M. Jason, Object-based collective communication in Java. Palo Alto, California, United States: ACM Press, 2001.
|
| |
5
|
[5] M. Barnett, L. Shuler, R. van de Geijn, S. Gupta, D. G. Payne, and J. Watts, "Interprocessor collective communication library (InterCom)," 1994.
|
| |
6
|
Vasanth Bala , Jehoshua Bruck , Robert Cypher , Pablo Elustando , Alex Ho , Ching-Tien Ho , Shlomo Kipnis , Marc Snir, CCL: A Portable and Tunable Collective Communication Library for Scalable Parallel Computers, IEEE Transactions on Parallel and Distributed Systems, v.6 n.2, p.154-164, February 1995
[doi> 10.1109/71.342126
]
|
| |
7
|
|
| |
8
|
[8] L. Jiuxing, "Designing high performance and scalable mpi over infiniband," 2004, pp. 186.
|
| |
9
|
[9] S. Xantheas and G. Fanourgakis, "Polarizable Water Potentials," 2005, pp. Personal Communication.
|
| |
10
|
[10] Z. Yeliang, V. Tipparaju, J. Nieplocha, and S. Hariri, "Parallelization of the NAS Conjugate Gradient Benchmark Using the Global Arrays Shared Memory Programming Model," 2005.
|
| |
11
|
[11] S. Baden, P. Collela, D. Shalit, and B. Van Straalen, "Abstract Kelp," presented at International Conference on Computational Science, San Francisco, CA, 2001.
|
| |
12
|
[12] R. Bariuso and A. Knies, SHMEM's User's Guide: Cray Research, Inc., 1994.
|
| |
13
|
[13] A. Basumallik, S.-J. Min, and R. Eigenmann, "Towards OpenMP execution on software distributed shared memory systems," presented at Int'l Workshop on OpenMP: Experiences and Implementations (WOMPEI'02), 2002.
|
| |
14
|
|
| |
15
|
|
| |
16
|
[16] S. V. Sathish, E. F. Graham, and D. Jack, Automatically tuned collective communications. Dallas, Texas, United States: IEEE Computer Society, 2000.
|
| |
17
|
[17] "InfiniBand Trade Association. InfiniBand Architecture Specification, Release 1.0, October 24 2000." www.infinibandta.org
|
| |
18
|
|
| |
19
|
[19] S. Susumu, M. Hiroyuki, N. Shigeki, and H. Jun-ichi, Scatter and gather operations on an asynchronous communication model. Como, Italy: ACM Press, 2000.
|
| |
20
|
|
| |
21
|
|
| |
22
|
Darius Buntinas , Dhabaleswar K. Panda , José Duato , P. Sadayappan, Broadcast/Multicast over Myrinet Using NIC-Assisted Multidestination Messages, Proceedings of the 4th International Workshop on Network-Based Parallel Computing: Communication, Architecture, and Applications, p.115-129, January 08, 2000
|
| |
23
|
[23] W. Yu, D. Buntinas, and D. K. Panda, "High performance and reliable NIC-based multicast over Myrinet/GM-2," presented at Parallel Processing, 2003. Proceedings. 2003 International Conference on, 2003.
|
| |
24
|
Nanette J. Boden , Danny Cohen , Robert E. Felderman , Alan E. Kulawik , Charles L. Seitz , Jakov N. Seizovic , Wen-King Su, Myrinet: A Gigabit-per-Second Local Area Network, IEEE Micro, v.15 n.1, p.29-36, February 1995
[doi> 10.1109/40.342015
]
|
| |
25
|
|
| |
26
|
|
|