Abstract
The Farsite file system is a storage service that runs on the desktop computers of a large organization and provides the semantics of a central NTFS file server. The motivation behind the Farsite project was to harness the unused storage and network resources of desktop computers to provide a service that is reliable, available, and secure despite the fact that it runs on machines that are unreliable, often unavailable, and of limited security. A main premise of the project has been that building a scalable system requires more than scalable algorithms: To be scalable in a practical sense, a distributed system targeting 105 nodes must tolerate a significant (and never-zero) rate of machine failure, a small number of malicious participants, and a substantial number of opportunistic participants. It also must automatically adapt to the arrival and departure of machines and changes in machine availability, and it must be able to autonomically repartition its data and metadata as necessary to balance load and alleviate hotspots. We describe the history of the project, including its multiple versions of major system components, the unique programming style and software-engineering environment we created to facilitate development, our distributed debugging framework, and our experiences with formal system specification. We also report on the lessons we learned during this development.
- A. Adya, W. J. Bolosky, M. Castro, R. Chaiken, G. Cermak, J. R. Douceur, J. Howell, J. R. Lorch, M. Theimer, R. P. Wattenhofer. "FARSITE: Federated, Available, and Reliable Storage for an Incompletely Trusted Environment," 5th OSDI, Dec 2002. Google ScholarDigital Library
- A. Adya, J. Howell, M. Theimer, B. Bolosky, J. Douceur. "Cooperative Task Management without Manual Stack Management." USENIX Annual Technical Conference, 2002. Google ScholarDigital Library
- W. J. Bolosky, J. S. Barrera III, R. P. Draves, R. P. Fitzgerald, G. A. Gibson, M. B. Jones, S. P. Levi, N. P. Myhrvold, and R. F. Rashid. "The Tiger Video Fileserver," in NOSSDAV '96, April, 1996.Google Scholar
- W. J. Bolosky, J. R. Douceur, D. Ely, and M. Theimer, "Feasibility of a Serverless Distributed File System Deployed on an Existing Set of Desktop PCs." SIGMETRICS 2000, Jun 2000. Google ScholarDigital Library
- W. J. Bolosky, S. Corbin, D. Goebel, J. R. Douceur. "Single Instance Storage in Windows 2000." 4th Usenix Windows System Symposium, Aug 2000. Google ScholarDigital Library
- M. Castro and B. Liskov, "Practical Byzantine Fault Tolerance", 3rd OSDI, USENIX, Feb 1999. Google ScholarDigital Library
- J. R. Douceur and R. P. Wattenhofer. "Large-Scale Simulation of a Replica Placement Algorithms for a Serverless Distributed File System." 9th MASCOTS, IEEE, Aug 2001. Google ScholarDigital Library
- J. R. Douceur and R. P. Wattenhofer, "Modeling Replica Placement in a Distributed File System: Narrowing the Gap between Competitive Analysis and Simulation", ESA 2001, Aug 2001. Google ScholarDigital Library
- J. R. Douceur and R. P. Wattenhofer, "Competitive Hill-Climbing Strategies for Replica Placement in a Distributed File System", 15th DISC, Oct 2001. Google ScholarDigital Library
- J. R. Douceur and R. P. Wattenhofer, "Optimizing File Availability in a Secure Serverless Distributed File System", 20th SRDS, IEEE, Oct 2001.Google ScholarCross Ref
- J. R. Douceur, A. Adya, W. J. Bolosky, D. Simon, M. Theimer, "Reclaiming Space from duplicate Files in a Serverless Distributed File System", ICDCS, Jul 2002. Google ScholarDigital Library
- J. R. Douceur, A. Adya, J. Benaloh, W. J. Bolosky, G. Yuval. "A Secure Directory Service based on Exclusive Encryption." 18th ACSAC, 2002. Google ScholarDigital Library
- J. R. Douceur; J. Howell. "Scalable Byzantine-Fault-Quantifying Clock Synchronization." Microsoft Research tech report MSR-TR-2003-67, 2003.Google Scholar
- J. R. Douceur, J. Howell. "Byzantine Fault Isolation in the Farsite Distributed File System." 5th IPTPS, 2006.Google Scholar
- J. R. Douceur, J. Howell. "Distributed Directory Service in the Farsite File System." 7th OSDI, 2006. Google ScholarDigital Library
- J. M. Hart. Win32 System Programming: A Windows(R) 2000 Application Developer's Guide, Second Edition, Addison-Wesley, 2000. Google ScholarDigital Library
- J. Kistler, M. Satyanarayanan. "Disconnected operation in the Coda File System." TOCS 10(1), Feb 1992. Google ScholarDigital Library
- L. Lamport. "The part-time parliament." TOCS, 16(2):133--169, May 1998. Google ScholarDigital Library
- L. Lamport. "Paxos made simple." ACM SIGACT News, 32(4): 18--25, Dec. 2001.Google Scholar
- L. Lamport. Specifying Systems. Addison-Wesley, 2003.Google Scholar
- D. B. Lomet. "Process structuring, synchronization, and recovery using atomic actions." ACM Conference on Language Design for Reliable Software, SIGPLAN Notices 12(3), pp. 128--137, 1977. Google ScholarDigital Library
- J. R. Lorch, A. Adya, W. J. Bolosky, R. Chaiken, J. R. Douceur, J. Howell. "The SMART Way to Migrate Replicated Stateful Services." EuroSys 2006. Google ScholarDigital Library
- Microsoft Corporation. "IFS Kit - Installable File System Kit." http://www.microsoft.com/whdc/Dev/Tools/IFSKit/default.mspxGoogle Scholar
- J. K. Ousterhout. "Why Threads Are a Bad Idea (for most purposes)." USENIX Annual Technical Conference, 1996.Google Scholar
- M. Ronsse, K. De Bosschere, J. C. de Kergommeaux. "Execution replay and debugging." Automated and Algorithmic Debugging, pp. 5--18. 2000.Google Scholar
- S. T. Shafer, "The Enemy Within", Red Herring, Jan 2002.Google Scholar
- R. von Behren, J. Condit, E. Brewer. "Why events are a bad idea (for high-concurrency servers)." HotOS IX. May 2003. Google ScholarDigital Library
- B. Welch, J. Ousterhout. "Prefix Tables: A Simple Mechanism for Locating Files in a Distributed System," 6th ICDCS, 1986.Google Scholar
Index Terms
- The Farsite project: a retrospective
Recommendations
Farsite: federated, available, and reliable storage for an incompletely trusted environment
OSDI '02: Proceedings of the 5th Symposium on Operating Systems Design and ImplementationFarsite is a secure, scalable file system that logically functions as a centralized file server but is physically distributed among a set of untrusted computers. Farsite provides file availability and reliability through randomized replicated storage; ...
Farsite: federated, available, and reliable storage for an incompletely trusted environment
OSDI '02: Proceedings of the 5th symposium on Operating systems design and implementation (Copyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading)Farsite is a secure, scalable file system that logically functions as a centralized file server but is physically distributed among a set of untrusted computers. Farsite provides file availability and reliability through randomized replicated storage; ...
Distributed directory service in the farsite file system
OSDI '06: Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7We present the design, implementation, and evaluation of a fully distributed directory service for Farsite, a logically centralized file system that is physically implemented on a loosely coupled network of desktop computers. Prior to this work, the ...
Comments