skip to main content
article

The Farsite project: a retrospective

Published:01 April 2007Publication History
Skip Abstract Section

Abstract

The Farsite file system is a storage service that runs on the desktop computers of a large organization and provides the semantics of a central NTFS file server. The motivation behind the Farsite project was to harness the unused storage and network resources of desktop computers to provide a service that is reliable, available, and secure despite the fact that it runs on machines that are unreliable, often unavailable, and of limited security. A main premise of the project has been that building a scalable system requires more than scalable algorithms: To be scalable in a practical sense, a distributed system targeting 105 nodes must tolerate a significant (and never-zero) rate of machine failure, a small number of malicious participants, and a substantial number of opportunistic participants. It also must automatically adapt to the arrival and departure of machines and changes in machine availability, and it must be able to autonomically repartition its data and metadata as necessary to balance load and alleviate hotspots. We describe the history of the project, including its multiple versions of major system components, the unique programming style and software-engineering environment we created to facilitate development, our distributed debugging framework, and our experiences with formal system specification. We also report on the lessons we learned during this development.

References

  1. A. Adya, W. J. Bolosky, M. Castro, R. Chaiken, G. Cermak, J. R. Douceur, J. Howell, J. R. Lorch, M. Theimer, R. P. Wattenhofer. "FARSITE: Federated, Available, and Reliable Storage for an Incompletely Trusted Environment," 5th OSDI, Dec 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Adya, J. Howell, M. Theimer, B. Bolosky, J. Douceur. "Cooperative Task Management without Manual Stack Management." USENIX Annual Technical Conference, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. W. J. Bolosky, J. S. Barrera III, R. P. Draves, R. P. Fitzgerald, G. A. Gibson, M. B. Jones, S. P. Levi, N. P. Myhrvold, and R. F. Rashid. "The Tiger Video Fileserver," in NOSSDAV '96, April, 1996.Google ScholarGoogle Scholar
  4. W. J. Bolosky, J. R. Douceur, D. Ely, and M. Theimer, "Feasibility of a Serverless Distributed File System Deployed on an Existing Set of Desktop PCs." SIGMETRICS 2000, Jun 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. W. J. Bolosky, S. Corbin, D. Goebel, J. R. Douceur. "Single Instance Storage in Windows 2000." 4th Usenix Windows System Symposium, Aug 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Castro and B. Liskov, "Practical Byzantine Fault Tolerance", 3rd OSDI, USENIX, Feb 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. R. Douceur and R. P. Wattenhofer. "Large-Scale Simulation of a Replica Placement Algorithms for a Serverless Distributed File System." 9th MASCOTS, IEEE, Aug 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. R. Douceur and R. P. Wattenhofer, "Modeling Replica Placement in a Distributed File System: Narrowing the Gap between Competitive Analysis and Simulation", ESA 2001, Aug 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. R. Douceur and R. P. Wattenhofer, "Competitive Hill-Climbing Strategies for Replica Placement in a Distributed File System", 15th DISC, Oct 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. R. Douceur and R. P. Wattenhofer, "Optimizing File Availability in a Secure Serverless Distributed File System", 20th SRDS, IEEE, Oct 2001.Google ScholarGoogle ScholarCross RefCross Ref
  11. J. R. Douceur, A. Adya, W. J. Bolosky, D. Simon, M. Theimer, "Reclaiming Space from duplicate Files in a Serverless Distributed File System", ICDCS, Jul 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. R. Douceur, A. Adya, J. Benaloh, W. J. Bolosky, G. Yuval. "A Secure Directory Service based on Exclusive Encryption." 18th ACSAC, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. R. Douceur; J. Howell. "Scalable Byzantine-Fault-Quantifying Clock Synchronization." Microsoft Research tech report MSR-TR-2003-67, 2003.Google ScholarGoogle Scholar
  14. J. R. Douceur, J. Howell. "Byzantine Fault Isolation in the Farsite Distributed File System." 5th IPTPS, 2006.Google ScholarGoogle Scholar
  15. J. R. Douceur, J. Howell. "Distributed Directory Service in the Farsite File System." 7th OSDI, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. M. Hart. Win32 System Programming: A Windows(R) 2000 Application Developer's Guide, Second Edition, Addison-Wesley, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Kistler, M. Satyanarayanan. "Disconnected operation in the Coda File System." TOCS 10(1), Feb 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. L. Lamport. "The part-time parliament." TOCS, 16(2):133--169, May 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. L. Lamport. "Paxos made simple." ACM SIGACT News, 32(4): 18--25, Dec. 2001.Google ScholarGoogle Scholar
  20. L. Lamport. Specifying Systems. Addison-Wesley, 2003.Google ScholarGoogle Scholar
  21. D. B. Lomet. "Process structuring, synchronization, and recovery using atomic actions." ACM Conference on Language Design for Reliable Software, SIGPLAN Notices 12(3), pp. 128--137, 1977. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. R. Lorch, A. Adya, W. J. Bolosky, R. Chaiken, J. R. Douceur, J. Howell. "The SMART Way to Migrate Replicated Stateful Services." EuroSys 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Microsoft Corporation. "IFS Kit - Installable File System Kit." http://www.microsoft.com/whdc/Dev/Tools/IFSKit/default.mspxGoogle ScholarGoogle Scholar
  24. J. K. Ousterhout. "Why Threads Are a Bad Idea (for most purposes)." USENIX Annual Technical Conference, 1996.Google ScholarGoogle Scholar
  25. M. Ronsse, K. De Bosschere, J. C. de Kergommeaux. "Execution replay and debugging." Automated and Algorithmic Debugging, pp. 5--18. 2000.Google ScholarGoogle Scholar
  26. S. T. Shafer, "The Enemy Within", Red Herring, Jan 2002.Google ScholarGoogle Scholar
  27. R. von Behren, J. Condit, E. Brewer. "Why events are a bad idea (for high-concurrency servers)." HotOS IX. May 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. B. Welch, J. Ousterhout. "Prefix Tables: A Simple Mechanism for Locating Files in a Distributed System," 6th ICDCS, 1986.Google ScholarGoogle Scholar

Index Terms

  1. The Farsite project: a retrospective

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader