|
ABSTRACT
Modern distributed applications pose increasing demands for high availability, automatic management, and dynamic configuration of their software systems. This paper presents the architecture of Sampa, a System for Availability Management of Process-based Applications, which aims at fulfilling these requirements. The system has been designed to support the management of fault-tolerant DCE-based distributed programs according to user-provided and application-specific availability specifications. It is supposed to detect and automatically react to faults such as node crashes, network partitions, process crashes, and hang-ups. In this paper, we focus on the design of some of its services - the monitoring, checkpointing, and configuration management facilities - and show how they can be used for managing a generic fault-tolerant service.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
{1} J.S. Auerbach et al. Concert/C Tutorial and User Guide: An Introduction to a Language for Distributed C Programming. Technical report, IBM T. J. Watson Research Center, January 1995.
|
| |
2
|
{2} T. Becker. Application-Transparent Fault Tolerance in Distributed Systems. In Proc. of 2nd. Int. Workshop on Configurable Distributed Systems, pages 36-45, March 1994.
|
 |
3
|
|
| |
4
|
|
| |
5
|
{5} M. Endler and A. D'Souza. Supporting Distributed Application Management in Sampa. Technical Report RT-MAC-9516, IME/USP, November 1995.
|
| |
6
|
{6} J.W. Hong and M.A. Bauer. A Generic Management Framework for Distributed Applications. In Proc. 1st Int. Workshop on System Management, pages 63-71. IEEE, April 1993.
|
| |
7
|
{7} Y. Huang and C. Kintala. Software Fault Tolerance in the Application Layer, chapter 10. John Wiley & Sons, 1995.
|
| |
8
|
{8} D.B. Johnson and W. Zwaenepoel. Sender-Based Message Logging. In Proc. of 7th. Int. Symposium on Fault-Tolerant Computing , pages 14-19, July 1987.
|
| |
9
|
{9} J. Kramer. Configuration Programming - A Framework for the Development of Distributable Systems. In Proc. IEEE Int. Conf. on Computer Systems and Software Engineering (CompEuro90), Tel Aviv, Israel, May 1990.
|
| |
10
|
{10} J. Magee, N. Dulay, and J. Kramer. Structuring parallel and distributed programs. In Proc. of the Int. Workshop on Configurable Distributed Systems, pages 102-117. IEE, March 1992.
|
| |
11
|
|
| |
12
|
|
| |
13
|
|
| |
14
|
J. David Narkiewicz , Mahesh Girkar , Manoj Srivastava , Arthur S. Gaylord , Mustafizur Rahman, Pilgrim's OSF DCE-based Services Architecture, Proceedings of the International DCE Workshop on DCE - The OSF Distributed Computing Environment, Client/Server Model and Beyond, p.120-134, October 07-08, 1993
|
| |
15
|
|
| |
16
|
|
| |
17
|
|
Peer to Peer - Readers of this Article have also read:
-
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
-
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
-
The GemStone object database management system
Communications of the ACM
34, 10
Paul Butterworth
, Allen Otis
, Jacob Stein
-
Putting innovation to work: adoption strategies for multimedia communication systems
Communications of the ACM
34, 12
Ellen Francik
, Susan Ehrlich Rudman
, Donna Cooper
, Stephen Levine
-
An intelligent component database for behavioral synthesis
Proceedings of the 27th ACM/IEEE conference on Design automation
Gwo-Dong Chen
, Daniel D. Gajski
|