|
ABSTRACT
We describe the design and implementation of a clustering service for a high-performance, shared-disk file system. The service provides failure detection and recovery, reliableend-to-end messaging, and a centralized and recoverable management interface. We implement novel optimizations in the voting protocol that resolves cluster membership. Optimizations allow clusters to form as quickly as possible without introducing livelock or requiring timeout parameters to be tuned carefully. Our treatment includes performance results that quantify the scalability of the system and measure recovery times.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Yair Amir , Giuseppe Ateniese , Damian Hasse , Yongdae Kim , Cristina Nita-Rotaru , Theo Schlossnagle , John Schultz , Jonathan Stanton , Gene Tsudik, Secure Group Communication in Asynchronous Networks with Failures: Integration and Experiments, Proceedings of the The 20th International Conference on Distributed Computing Systems ( ICDCS 2000), p.330, April 10-13, 2000
|
| |
2
|
[2] Y. Amir and J. Stanton. The spread wide area group communication system. Technical Report CNDS-98-4, Center for Network and Distributed Systems, Johns Hopkins University, 1998.
|
| |
3
|
|
| |
4
|
|
| |
5
|
[5] Zoning implementation strategies for brocade SAN fabrics. Brocade Inc., White Paper, 2002.
|
| |
6
|
|
| |
7
|
[7] R. Burns, R. M. Rees, and D. D. E. Long. Safe caching in a distributed file system. In Proceedings of the International Parallel and Distributed Processing Symposium, 2000.
|
| |
8
|
[8] D. Naor et al. Object store security document. Storage Networking Industry Association (SNIA), 2003.
|
 |
9
|
|
 |
10
|
|
| |
11
|
|
| |
12
|
|
| |
13
|
|
 |
14
|
|
| |
15
|
[15] RS/6000 SP high availability infrastructure. IBM Redbook SG224-4838, IBM, 1996.
|
| |
16
|
|
| |
17
|
[17] F. Jahanian, R. Rajkumar, and S. Fakhouri. Processor group membership protocols: Specification, design, and implementation. In Proceedings of the IEEE Symposium on Reliable Distributed Systems, 1993.
|
 |
18
|
|
 |
19
|
Idit Keidar , Danny Dolev, Increasing the resilience of atomic commit, at no additional cost, Proceedings of the fourteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.245-254, May 22-25, 1995, San Jose, California, United States
[doi> 10.1145/212433.212468]
|
 |
20
|
|
| |
21
|
|
| |
22
|
|
 |
23
|
|
| |
24
|
[24] C. Malloth and K. Schiper. View synchronous communication in large scale networks. In Workshop of the ESPRIT project BROADCAST, number 6360, 1995.
|
| |
25
|
|
 |
26
|
|
| |
27
|
[27] L. E. Moser, Y. Amir, P. M. Melliar-Smith, and D. A. Agarwal. Extended virtual synchrony. In The IEEE International Conference on Distributed Computing Systems (ICDCS), 1994.
|
| |
28
|
|
| |
29
|
[29] J. Palmer, R. Strong, and E. Upfal. Nonblocking ordered reliable multicast in an unreliable distributed environment. Technical Report RJ-10096 (91913), IBM Research Division, 1997.
|
| |
30
|
|
 |
31
|
|
| |
32
|
[32] L. Rodrigues and P. Verissimo. xAMp: A protocol suite for group communication. Technical Report RT/43-92, INSEC, 1992.
|
| |
33
|
|
| |
34
|
[34] D. Skeen. A quorum-based commit protocol. In Workshop of Distributed Data Management and Computer Networks, 1982.
|
 |
35
|
|
|