|
ABSTRACT
The performances of multiprocessor systems mainly rely on the processor clock speed and the memory latency. As the processors speed up rapidly, the memory latency becomes a major performance bottleneck in multiprocessor systems. In this paper, we propose a dual-link interconnection topology and its effective routing scheme to reduce the remote memory latency on the interconnection network. It can be applied at a same implementation cost as traditional bi-directional ring systems. We compare the performance of the proposed system to that of the traditional bi-directional ring-based system and toroidal mesh-based system. By simulations, it is shown that the proposed system outperforms the traditional bi-directional ring-based system by 42~101 % and excels the toroidal mesh-based system by 4~14%.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
R. Rabenseifner, "Hybrid Parallel Programming on HPC Platforms", Proc. Fifth European Workshop on OpenMP. (EWOMP-03), Sept. 22--26, 2003, pp 185--194.
|
| |
2
|
L. Smith and M. Bull, "Development of mixed mode MPI / OpenMP applications", Scientific Programming, vol. 9, no. 2-3, 2001, pp. 83--98.
|
| |
3
|
R. Kutil and A. Uhl. "Architecture comparison in a high bandwidth application", Proc. Int'l Workshop on Parallel Numerics. (PARNUM-2000), Sept. 2000, pp. 219--228.
|
| |
4
|
IEEE Computer Society, IEEE Standard for Scalable Coherent Interface(SCI), Inst. of Electrical and Electronics Engineers, Aug. 1993.
|
| |
5
|
J. Tao, W. Karl, and M. Schulz, "A novel approach for data distribution on NUMA machines", Proc. Int'l Conf. Architecture of Computing Systems. (ARCS-02), Apr. 2002, pp. 189--195.
|
| |
6
|
Hitoshi Oi and N. Ranganathan, "A Comparative Study of Bidirectional Ring and Crossbar Interconnection Networks", Proc. Int'l Conf. Parallel and Distributed Processing Techniques and Applications. (PDPTA-98), July 1998, pp 883--890.
|
| |
7
|
Data General AViiON, http://www.dg.com.
|
 |
8
|
Byoung Soon Jang , Sung Woo Chung , Seong Tae Jhang , Chu Shik Jhon, Efficient schemes to scale the interconnection network bandwidth in a ring-based multiprocessor system, Proceedings of the 2001 ACM symposium on Applied computing, p.510-516, March 2001, Las Vegas, Nevada, United States
[doi> 10.1145/372202.372447]
|
| |
9
|
|
| |
10
|
|
 |
11
|
Steven Cameron Woo , Moriyoshi Ohara , Evan Torrie , Jaswinder Pal Singh , Anoop Gupta, The SPLASH-2 programs: characterization and methodological considerations, Proceedings of the 22nd annual international symposium on Computer architecture, p.24-36, June 22-24, 1995, S. Margherita Ligure, Italy
|
|