| Scalable parallel application launch on Cplant™ |
| Full text |
Pdf
(83 KB)
|
| Source
|
Conference on High Performance Networking and Computing
archive
Proceedings of the 2001 ACM/IEEE conference on Supercomputing (CDROM)
table of contents
Denver, Colorado
Pages: 40 - 40
Year of Publication: 2001
ISBN:1-58113-293-X
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 1, Downloads (12 Months): 14, Citation Count: 3
|
|
|
ABSTRACT
This paper describes the components of a runtime system for launching parallel applications and presents performance results for starting a job on more than a thousand nodes of a workstation cluster. This runtime system was developed at Sandia National Laboratories as part of the Computational Plant (Cplant™) project, which is deploying large-scale parallel computing clusters using commodity hardware and the Linux operating system. We have designed and implemented a flexible runtime system that allows for launching parallel jobs on thousands of nodes in a matter of seconds. The interactions of the components are described, and the key issues that address the scalability and performance of the runtime system are discussed. We also present performance results of launching executables of varying sizes on more than a thousand nodes.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Portable Batch System. http://www.openpbs.org.
|
| |
3
|
Nanette J. Boden , Danny Cohen , Robert E. Felderman , Alan E. Kulawik , Charles L. Seitz , Jakov N. Seizovic , Wen-King Su, Myrinet: A Gigabit-per-Second Local Area Network, IEEE Micro, v.15 n.1, p.29-36, February 1995
[doi> 10.1109/40.342015
]
|
| |
4
|
Ron Brightwell , Lee Ann Fisk , David S. Greenberg , Tramm Hudson , Mike Levenhagen , Arthur B. MacCabe , Rolf Riesen, Massively parallel computing using commodity components, Parallel Computing, v.26 n.2-3, p.243-266, Feb. 2000
[doi> 10.1016/S0167-8191(99)00104-0
]
|
| |
5
|
R. B. Brightwell, T. B. Hudson, A. B. Maccabe, and R. E. Riesen. The Portals 3.0 Message Passing Interface. Technical Report SAND99-2959, Sandia National Laboratories, December 1999.
|
| |
6
|
|
| |
7
|
Douglas P. Ghormley , David Petrou , Steven H. Rodrigues , Amin M. Vahdat , Thomas E. Anderson, GLUnix: a global layer Unix for a network of workstations, Software—Practice & Experience, v.28 n.9, p.929-961, July 25, 1998
[doi> 10.1002/(SICI)1097-024X(19980725)28:9<929::AID-SPE183>3.0.CO;2-C
]
|
 |
8
|
David S. Greenberg , Ron Brightwell , Lee Ann Fisk , Arthur Maccabe , Rolf Riesen, A system software architecture for high-end computing, Proceedings of the 1997 ACM/IEEE conference on Supercomputing (CDROM), p.1-15, November 15-21, 1997, San Jose, CA
[doi> 10.1145/509593.509646]
|
| |
9
|
|
| |
10
|
Message Passing Interface Forum. MPI: A Message-Passing Interface standard. The International Journal of Supercomputer Applications and High Performance Computing, 8, 1994.
|
| |
11
|
Sandia National Laboratories. ASCI Red, 1996. http://www.sandia.gov/ASCI/TFLOP/Home_Page.html.
|
| |
12
|
L. Shuler, C. Jong, R. Riesen, D. van Dresser, A. B. Maccabe, L. A. Fisk, and T. M. Stallcup. The Puma operating system for massively parallel computers. In Proceeding of the 1995 Intel Supercomputer User's Group Conference. Intel Supercomputer User's Group, 1995.
|
CITED BY 3
|
|
|
|
|
Eitan Frachtenberg , Fabrizio Petrini , Juan Fernandez , Scott Pakin , Salvador Coll, STORM: lightning-fast resource management, Proceedings of the 2002 ACM/IEEE conference on Supercomputing, p.1-26, November 16, 2002, Baltimore, Maryland
|
|
Wei Huang , Jiuxing Liu , Bulent Abali , Dhabaleswar K. Panda, A case for high performance computing with virtual machines, Proceedings of the 20th annual international conference on Supercomputing, June 28-July 01, 2006, Cairns, Queensland, Australia
|
|