ACM Home Page
Please provide us with feedback. Feedback
Scalable parallel application launch on Cplant™
Full text PdfPdf (83 KB)
Source Conference on High Performance Networking and Computing archive
Proceedings of the 2001 ACM/IEEE conference on Supercomputing (CDROM) table of contents
Denver, Colorado
Pages: 40 - 40  
Year of Publication: 2001
ISBN:1-58113-293-X
Authors
Ron Brightwell  Sandia National Laboratories, Albuquerque, NM
Lee Ann Fisk  Sandia National Laboratories, Albuquerque, NM
Sponsors
ACM: Association for Computing Machinery
SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE-CS\DATC : IEEE Computer Society
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 1,   Downloads (12 Months): 14,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues   peer to peer  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/582034.582074
What is a DOI?

ABSTRACT

This paper describes the components of a runtime system for launching parallel applications and presents performance results for starting a job on more than a thousand nodes of a workstation cluster. This runtime system was developed at Sandia National Laboratories as part of the Computational Plant (Cplant™) project, which is deploying large-scale parallel computing clusters using commodity hardware and the Linux operating system. We have designed and implemented a flexible runtime system that allows for launching parallel jobs on thousands of nodes in a matter of seconds. The interactions of the components are described, and the key issues that address the scalability and performance of the runtime system are discussed. We also present performance results of launching executables of varying sizes on more than a thousand nodes.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
Portable Batch System. http://www.openpbs.org.
 
3
 
4
 
5
R. B. Brightwell, T. B. Hudson, A. B. Maccabe, and R. E. Riesen. The Portals 3.0 Message Passing Interface. Technical Report SAND99-2959, Sandia National Laboratories, December 1999.
 
6
 
7
8
 
9
 
10
Message Passing Interface Forum. MPI: A Message-Passing Interface standard. The International Journal of Supercomputer Applications and High Performance Computing, 8, 1994.
 
11
Sandia National Laboratories. ASCI Red, 1996. http://www.sandia.gov/ASCI/TFLOP/Home_Page.html.
 
12
L. Shuler, C. Jong, R. Riesen, D. van Dresser, A. B. Maccabe, L. A. Fisk, and T. M. Stallcup. The Puma operating system for massively parallel computers. In Proceeding of the 1995 Intel Supercomputer User's Group Conference. Intel Supercomputer User's Group, 1995.


Collaborative Colleagues:
Ron Brightwell: colleagues
Lee Ann Fisk: colleagues

Peer to Peer - Readers of this Article have also read: