skip to main content
article

Speculative execution in a distributed file system

Published:20 October 2005Publication History
Skip Abstract Section

Abstract

Speculator provides Linux kernel support for speculative execution. It allows multiple processes to share speculative state by tracking causal dependencies propagated through inter-process communication. It guarantees correct execution by preventing speculative processes from externalizing output, e.g., sending a network message or writing to the screen, until the speculations on which that output depends have proven to be correct. Speculator improves the performance of distributed file systems by masking I/O latency and increasing I/O throughput. Rather than block during a remote operation, a file system predicts the operation's result, then uses Speculator to checkpoint the state of the calling process and speculatively continue its execution based on the predicted result. If the prediction is correct, the checkpoint is discarded; if it is incorrect, the calling process is restored to the checkpoint, and the operation is retried. We have modified the client, server, and network protocol of two distributed file systems to use Speculator. For PostMark and Andrew-style benchmarks, speculative execution results in a factor of 2 performance improvement for NFS over local-area networks and an order of magnitude improvement over wide-area networks. For the same benchmarks, Speculator enables the Blue File System to provide the consistency of single-copy file semantics and the safety of synchronous I/O, yet still outperform current distributed file systems with weaker consistency and safety.

References

  1. Adya, A., Bolosky, W. J., Castro, M., Cermak, G., Chaiken, R., Douceur, J. R., Howell, J., Lorch, J. R., Theimer, M., and Wattenhofer, R. P. FARSITE: Federated, available, and reliable storage for an incompletely trusted environment. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (Boston, MA, December 2002), pp. 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Birrell, A. D., Hisgen, A., Jerian, C., Mann, T., and Swart, G. The Echo distributed file system. Tech. Rep. 111, Digital Equipment Corporation, Palo Alto, CA, USA, October 1993.Google ScholarGoogle Scholar
  3. Callaghan, B., Pavlowski, B., and Staubach, P. NFS Version 3 Protocol Specification. Tech. Rep. RFC 1813, IETF, June 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Carson, M. Adaptation and Protocol Testing thorugh Network Emulation. NIST, http://snad.ncsl.nist.gov/itg/nistnet/slides/index.htm.Google ScholarGoogle Scholar
  5. Castro, M., and Liskov, B. Proactive recovery in a byzantine-fault-tolerant system. In Proceedings of the 4th Symposium on Operating Systems Design and Implementation (San Diego, CA, October 2000). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Chang, F., and Gibson, G. Automatic I/O hint generation through speculative execution. In Proceedings of the 3rd Symposium on Operating Systems Design and Implementation (New Orleans, LA, February 1999), pp. 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Cheriton, D., and Duda, K. Logged virtual memory. In Proceedings of the 15th ACM Symposium on Operating Systems Principles (Copper Mountain, CO, Dec. 1995), pp. 26--39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Elnozahy, E. N., Alvisi, L., Wang, Y.-M., and Johnson, D. B. A survey of rollback-recovery protocols in message-passing systems. ACM Computing Surveys 34, 3 (September 2002), 375--408. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Franklin, M., and Sohi, G. ARB: A hardware mechanism for dynamic reordering of memory references. IEEE Transactions on Computers 45, 5 (May 1996), 552--571. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Fraser, K., and Chang, F. Operating system I/O speculation: How two invocations are faster than one. In Proceedings of the 2003 USENIX Technical Conference (San Antonio, TX, June 2003), pp. 325--338.Google ScholarGoogle Scholar
  11. Haerder, T., and Reuter, A. Principles of Transaction-Oriented Database Recovery. ACM Computing Surveys 15, 4 (December 1983), 287--317. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Hammond, L., Willey, M., and Olukotun, K. Data speculation support for a chip multiprocessor. In Proc. of the 8th Intl. ACM Conf. on Arch. Support for Programming Languages and Operating Systems (San Jose, CA, October 1998), pp. 58--69. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Howard, J. H., Kazar, M. L., Menees, S. G., Nichols, D. A., Satyanarayanan, M., Sidebotham, R. N., and West, M. J. Scale and performance in a distributed file system. ACM Transactions on Computer Systems 6, 1 (February 1988). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jefferson, D. Virtual time. ACM Transactions on Programming Languages and Systems 7, 3 (July 1985), 404--425. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Jefferson, D., Beckman, B., Wieland, F., Blume, L., DiLoreto, M., P.Hontalas, Laroche, P., Sturdevant, K., Tupman, J., Warren, V., Weidel, J., Younger, H., and Bellenot, S. Time Warp operating system. In Proceedings of the 11th ACM Symposium on Operating Systems Principles (Austin, TX, November 1987), pp. 77--93. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Katcher, J. PostMark: A new file system benchmark. Tech. Rep. TR3022, Network Appliance, 1997.Google ScholarGoogle Scholar
  17. King, S. T., and Chen, P. M. Backtracking intrusions. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (Bolton Landing, NY, October 2003), pp. 223--236. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Kistler, J. J., and Satyanarayanan, M. Disconnected operation in the Coda file system. ACM Transactions on Computer Systems 10, 1 (February 1992). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Lamport, L. Time, clocks, and the ordering of events in a distributed system. Commun. ACM 21, 7 (1978), 558--565. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Li, J., Krohn, M., Mazières, D., and Shasha, D. Secure untrusted data repository (SUNDR). In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (San Francisco, CA, December 2004), pp. 121--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Liskov, B., and Rodrigues, R. Transactional file systems can be fast. In Proceedings of the 11th SIGOPS European Workshop (Leuven, Belgium, September 2004). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Mazières, D., Kaminsky, M., Kaashoek, M. F., and Witchel, E. Separating key management from file system security. In Proceedings of the 17th ACM Symposium on Operating Systems Principles (Kiawah Island, SC, December 1999), pp. 124--139. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Nelson, M. N., Welsh, B. B., and Ousterhout, J. K. Caching in the Sprite network file system. ACM Transactions on Computer Systems 6, 1 (1988), 134--154. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Nightingale, E. B., and Flinn, J. Energy-efficiency and storage flexibility in the Blue File System. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (San Francisco, CA, December 2004), pp. 363--378. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Rosenblum, M., Bugnion, E., Herrod, S. A., Witchel, E., and Gupta, A. The impact of architectural trends on operating system performance. In Proceedings of the 15th ACM Symposium on Operating Systems Principles (Copper Mountain, CO, December 1995), pp. 285--298. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Schmuck, F., and Wyllie, J. Experience with transactions in QuickSilver. In Proceedings of the 13th ACM Symposium on Operating Systems Principles (October 1991), pp. 239--53. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Shepler, S., Callaghan, B., Robinson, D., Thurlow, R., Beame, C., Eisler, M., and Noveck, D. Network File System (NFS) version 4 Protocol. Tech. Rep. RFC 3530, IETF, April 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Spector, A. Z., Daniels, D., Duchamp, D., Eppinger, J. L., and Pauch, R. Distributed transactions for reliable systems. In Proceedings of the 10th ACM Symposium on Operating Systems Principles (Orcas Island, WA, December 1985), pp. 127--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Srinivasan, S., Andrews, C., Kandula, S., and Zhou, Y. Flashback: A light-weight extension for rollback and deterministic replay for software debugging. In Proceedings of the 2004 USENIX Technical Conference (Boston, MA, June 2004). Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Srinivasan, V., and Mogul, J. Spritely NFS: Experiments with cache consistency protocols. In Proceedings of the 12th ACM Symposium on Operating System Principles (December 1989), pp. 45--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Steffan, J. G., Colohan, C. B., Zhai, A., and Mowry, T. C. A scalable approach to thread-level speculation. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA) (Vancouver, Canada, June 2000), pp. 1--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Weinstein, M. J., Thomas W. Page, J., Livezey, B. K., and Popek, G. J. Transactions and synchronization in a distributed operating system. In Proceedings of the 10th ACM Symposium on Operating Systems Principles (Orcas Island, WA, December 1985), pp. 115--126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Zhang, Y., Rauchwerger, L., and Torrellas, J. Hardware for speculative parallelization of partially-parallel loops in DSM multiprocessors. In Proc. of the 5th Intl. Symposium on High Performance Computer Architecture (Orlando, FL, January 1999), p. 135. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Zhu, N., and Chiueh, T. Design, implementation and evaluation of the Repairable File Service. In Proceedings of the International Conference on Dependable Systems and Networks (San Francisco, CA, June 2003).Google ScholarGoogle Scholar

Index Terms

  1. Speculative execution in a distributed file system

          Recommendations

          Reviews

          Bayard Kohlhepp

          Make it run; make it right; make it fast; make it small. The anonymous mantra above captures the evolution of all software (except for the "make it bloat" phase of overly mature products). Distributed processing has passed the novelty stage where just making it run was a thrill. Most improvements in the past decade have been targeted at making it right; in other words, ensuring correct, consistent data in the face of increasingly complicated user interaction scenarios. Pervasive Internet availability and the proliferation of handheld devices are currently motivating research into making software fast and small. It is the move of both casual and mission-critical applications to the Internet that has made improved performance of distributed software a primary goal. However, as in many other domains, designers have continually faced a tradeoff between making software fast and right; you could only increase one attribute by decreasing the other. The authors of this paper have broken through this dilemma. Their speculative techniques vastly improve performance without sacrificing correctness. Moreover, these are not just theoretical or hypothetical performance gains suggested by isolated laboratory experiments. The authors implement their techniques in off-the-shelf commodity software and benchmark the changes. Theirs is proven, real-world success. Their core finding is that it is cheaper to burn central processing unit (CPU) cycles and random access memory (RAM) (abundant resources) on checkpoints that are usually discarded than to waste time waiting for a synchronous remote procedure call to complete (time is the scarce resource). Because of longer latency times, Internet-based applications show even greater performance improvements than local area network applications. It?s that rare case where an increase in need is met with an increase in satisfaction. The paper is 15 pages long with one full page of bibliographic references (the references alone provide a great survey of the field). It is easy to read, and is more like a Communications of the ACM article than a research paper. A download of the base BlueFS file system they modified is available at http://notrump.eecs.umich.edu/group/group.html, the home page of the Pervasive Computing Research Group at University of Michigan. However, source code to the Speculator library or the modified file system is not available. The authors have revealed enough information to recreate their speculative library, but it would be great to see their work released under an open source license. Online Computing Reviews Service

          Access critical reviews of Computing literature here

          Become a reviewer for Computing Reviews.

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM SIGOPS Operating Systems Review
            ACM SIGOPS Operating Systems Review  Volume 39, Issue 5
            SOSP '05
            December 2005
            290 pages
            ISSN:0163-5980
            DOI:10.1145/1095809
            Issue’s Table of Contents
            • cover image ACM Conferences
              SOSP '05: Proceedings of the twentieth ACM symposium on Operating systems principles
              October 2005
              259 pages
              ISBN:1595930795
              DOI:10.1145/1095810

            Copyright © 2005 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 20 October 2005

            Check for updates

            Qualifiers

            • article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader