| Mobile MPI programs in computational grids |
| Full text |
Pdf
(188 KB)
|
| Source
|
Principles and Practice of Parallel Programming
archive
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
table of contents
New York, New York, USA
SESSION: Communication
table of contents
Pages: 22 - 31
Year of Publication: 2006
ISBN:1-59593-189-9
|
|
Authors
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 7, Downloads (12 Months): 78, Citation Count: 2
|
|
|
ABSTRACT
Utility computing is becoming a popular way of exploiting the potential of computational grids. In utility computing, users are provided with computational power in a transparent manner similar to the way in which electrical utilities supply power to their customers. To take full advantage of utility computing, an application needs to be mobile; that is, it needs to be able to migrate between heterogeneous computing platforms while it is executing. Further, it needs to be able to adapt to the computing resources at each site, such as the number of available physical processors. At present, there are few high-performance computing applications of this sort, and re-engineering legacy codes to be mobile can take enormous effort.In this paper, we describe theph$PC^3$ system, which converts C/MPI codes into mobile programs almost transparently. Because it is based on portable application-level checkpointing, it enables the state of running applications to be saved so that the application can be restarted on different architectures, operating systems and MPI implementations. Moreover, the number of processors on these machines can be different. To our knowledge, this is the first system to provide all these features. Experimental results show that the overhead introduced by the system is usually small.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
Greg Bronevetsky , Daniel Marques , Keshav Pingali , Paul Stodghill, Automated application-level checkpointing of MPI programs, Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming, June 11-13, 2003, San Diego, California, USA
|
 |
3
|
Greg Bronevetsky , Daniel Marques , Keshav Pingali , Paul Stodghill, Collective operations in application-level fault-tolerant MPI, Proceedings of the 17th annual international conference on Supercomputing, June 23-26, 2003, San Francisco, CA, USA
[doi> 10.1145/782814.782847]
|
| |
4
|
|
| |
5
|
Tom Goodale, Gabrielle Allen, Gerd Lanfermann, Joan Massó, Thomas Radke, Edward Seidel, and John Shalf. The Cactus Framework and Toolkit: Design and Applications. In VECPAR, 2002.
|
| |
6
|
Chao Huang, Orion Lawlor, and L. V. Kale. Adaptive MPI. In Languages and Compilers for Parallel Computers (LCPC), 2003.
|
| |
7
|
|
| |
8
|
|
| |
9
|
|
| |
10
|
M. Elnozahy, L. Alvisi, Y. M. Wang, and D. B. Johnson. A Survey of Rollback-recovery Protocols in Message Passing Systems. Technical Report CMU-CS-96-181, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA, October 1996.
|
| |
11
|
J. Basney M. Litzkow, T. Tannenbaum and M. Livny. Checkpoint and Migration of UNIX Processes in the Condor Distributed Processing System. Technical Report 1346, University of Wisconsin-Madison, 1997.
|
| |
12
|
NASA. NAS Parallel Benchmarks. http://www.nas.nasa.gov/Software/NPB/.
|
| |
13
|
James S. Plank, Micah Beck, Wael R. Elwasif, Terry Moore, Martin Swany, and Rich Wolski. The Internet Backplane Protocol: Storage in the Network. In NetStore99: The Network Storage Symposium, Seattle, WA, 1999.
|
| |
14
|
|
| |
15
|
Martin Schulz , Greg Bronevetsky , Rohit Fernandes , Daniel Marques , Keshav Pingali , Paul Stodghill, Implementation and Evaluation of a Scalable Application-Level Checkpoint-Recovery Scheme for MPI Programs, Proceedings of the 2004 ACM/IEEE conference on Supercomputing, p.38, November 06-12, 2004
[doi> 10.1109/SC.2004.29]
|
| |
16
|
|
| |
17
|
|
 |
18
|
|
| |
19
|
S. Vadhiyar and J. Dongarra. SRS - A Framework for Developing Malleable and Migratable Parallel Software. Parallel Processing Letters, 13(2):291--312, June 2003.
|
CITED BY 2
|
DaeGon Kim , Lakshminarayanan Renganarayanan , Dave Rostron , Sanjay Rajopadhye , Michelle Mills Strout, Multi-level tiling: M for the price of one, Proceedings of the 2007 ACM/IEEE conference on Supercomputing, November 10-16, 2007, Reno, Nevada
|
|
|
|