skip to main content
research-article
Open Access

Verdi: a framework for implementing and formally verifying distributed systems

Published:03 June 2015Publication History
Skip Abstract Section

Abstract

Distributed systems are difficult to implement correctly because they must handle both concurrency and failures: machines may crash at arbitrary points and networks may reorder, drop, or duplicate packets. Further, their behavior is often too complex to permit exhaustive testing. Bugs in these systems have led to the loss of critical data and unacceptable service outages. We present Verdi, a framework for implementing and formally verifying distributed systems in Coq. Verdi formalizes various network semantics with different faults, and the developer chooses the most appropriate fault model when verifying their implementation. Furthermore, Verdi eases the verification burden by enabling the developer to first verify their system under an idealized fault model, then transfer the resulting correctness guarantees to a more realistic fault model without any additional proof burden. To demonstrate Verdi's utility, we present the first mechanically checked proof of linearizability of the Raft state machine replication algorithm, as well as verified implementations of a primary-backup replication system and a key-value store. These verified systems provide similar performance to unverified equivalents.

References

  1. Amazon. Summary of the Amazon EC2 and Amazon RDS service disruption in the US East Region, Apr. 2011. http://aws.amazon.com/ message/65648/.Google ScholarGoogle Scholar
  2. S. Bishop, M. Fairbairn, M. Norrish, P. Sewell, M. Smith, and K. Wansbrough. Engineering with logic: HOL specification and symbolic-evaluation testing for TCP implementations. In Proceedings of the 33rd ACM Symposium on Principles of Programming Languages (POPL), pages 55–66, Charleston, SC, Jan. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. T. D. Chandra, R. Griesemer, and J. Redstone. Paxos made live: An engineering perspective. In Proceedings of the 26th ACM SIGACTSIGOPS Symposium on Principles of Distributed Computing (PODC), pages 398–407, Portland, OR, Aug. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Chlipala. Mostly-automated verification of low-level programs in computational separation logic. In Proceedings of the 2011 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 234–245, San Jose, CA, June 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. L. Constable, S. F. Allen, H. M. Bromley, W. R. Cleaveland, J. F. Cremer, R. W. Harper, D. J. Howe, T. B. Knoblock, N. P. Mendler, P. Panangaden, J. T. Sasaki, and S. F. Smith. Implementing Mathematics with The Nuprl Proof Development System. Prentice Hall, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. CoreOS. etcd: A highly-available key value store for shared configuration and service discovery, 2014.Google ScholarGoogle Scholar
  7. https://github.com/coreos/etcd.Google ScholarGoogle Scholar
  8. S. J. Garland and N. Lynch. Using I/O automata for developing distributed systems. In Foundations of Component-based Systems. Cambridge University Press, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. Geels, G. Altekar, P. Maniatis, T. Roscoe, and I. Stoica. Friday: Global comprehension for distributed replay. In Proceedings of the 4th Symposium on Networked Systems Design and Implementation (NSDI), pages 285–298, Cambridge, MA, Apr. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Guha, M. Reitblatt, and N. Foster. Machine-verified network controllers. In Proceedings of the 2013 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 483–494, Seattle, WA, June 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Z. Guo, S. McDirmid, M. Yang, L. Zhuang, P. Zhang, Y. Luo, T. Bergan, P. Bodik, M. Musuvathi, Z. Zhang, and L. Zhou. Failure recovery: When the cure is worse than the disease. In Proceedings of the HotOS XIV, Santa Ana Pueblo, NM, May 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Hayden. The Ensemble System. PhD thesis, Cornell University, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. P. Herlihy and J. M. Wing. Linearizability: A correctness condition for concurrent objects. ACM Transactions on Programming Languages and Systems, 12(3):463–492, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. J. Hickey, N. Lynch, and R. van Renesse. Specifications and proofs for Ensemble layers. In Proceedings of the 15th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS), pages 119–133, York, UK, Mar. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. High Scalability. The updated big list of articles on the Amazon outage, May 2011. http://highscalability.com/blog/2011/5/2/theupdated-big-list-of-articles-on-the-amazon-outage.html.Google ScholarGoogle Scholar
  16. D. Jackson. Software Abstractions: Logic, Language, and Analysis. MIT Press, Feb. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. Killian, J. W. Anderson, R. Braud, R. Jhala, and A. Vahdat. Mace: Language support for building distributed systems. In Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 179–188, San Diego, CA, June 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. Killian, J. W. Anderson, R. Jhala, and A. Vahdat. Life, death, and the critical transition: Finding liveness bugs in systems code. In Proceedings of the 4th Symposium on Networked Systems Design and Implementation (NSDI), pages 243–256, Cambridge, MA, Apr. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. G. Klein, K. Elphinstone, G. Heiser, J. Andronick, D. Cock, P. Derrin, D. Elkaduwe, K. Engelhardt, M. Norrish, R. Kolanski, T. Sewell, H. Tuch, and S. Winwood. seL4: Formal verification of an OS kernel. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP), pages 207–220, Big Sky, MT, Oct. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. L. Lamport. Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers. Addison-Wesley Professional, July 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. L. Lamport. Thinking for programmers, Apr. 2014. http://channel9.msdn.com/Events/Build/2014/3-642.Google ScholarGoogle Scholar
  22. V. Le, M. Afshari, and Z. Su. Compiler validation via equivalence modulo inputs. In Proceedings of the 2014 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 216–226, Edinburgh, UK, June 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. X. Leroy. Formal verification of a realistic compiler. Communications of the ACM, 52(7):107–115, July 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. X. Liu, C. Kreitz, R. van Renesse, J. Hickey, M. Hayden, K. Birman, and R. L. Constable. Building reliable, high-performance communication systems from components. In Proceedings of the 17th ACM Symposium on Operating Systems Principles (SOSP), pages 80–92, Kiawah Island, SC, Dec. 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. X. Liu, Z. Guo, X. Wang, F. Chen, X. Lian, J. Tang, M. Wu, M. F. Kaashoek, and Z. Zhang. D 3 S: Debugging deployed distributed systems. In Proceedings of the 5th Symposium on Networked Systems Design and Implementation (NSDI), pages 423–437, San Francisco, CA, Apr. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. N. A. Lynch. Distributed Algorithms. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1996. ISBN 1558603484. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. Nanevski, G. Morrisett, A. Shinnar, P. Govereau, and L. Birkedal. Ynot: Dependent types for imperative programs. In Proceedings of the 13th ACM SIGPLAN International Conference on Functional Programming (ICFP), Victoria, British Columbia, Canada, Sept. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. C. Newcombe, T. Rath, F. Zhang, B. Munteanu, M. Brooker, and M. Deardeuff. Use of formal methods at Amazon Web Services, Sept. 2014. http://research.microsoft.com/enus/um/people/lamport/tla/formal-methods-amazon.pdf.Google ScholarGoogle Scholar
  29. NYTimes. Amazon’s trouble raises cloud computing doubts, Apr. 2011. http://www.nytimes.com/2011/04/23/technology/23cloud.html.Google ScholarGoogle Scholar
  30. D. Ongaro. Consensus: Bridging Theory and Practice. PhD thesis, Stanford University, Aug. 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. D. Ongaro and J. Ousterhout. In search of an understandable consensus algorithm. In Proceedings of the 2014 USENIX Annual Technical Conference, pages 305–319, Philadelphia, PA, June 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. L. Peterson. Petri nets. ACM Computing Surveys, pages 223–252, Sept. 1977. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. V. Rahli. Interfacing with proof assistants for domain specific programming using EventML. In Proceedings of the 10th International Workshop On User Interfaces for Theorem Provers, Bremen, Germany, July 2012.Google ScholarGoogle Scholar
  34. D. Ricketts, V. Robert, D. Jang, Z. Tatlock, and S. Lerner. Automating formal proofs for reactive systems. In Proceedings of the 2014 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 452–462, Edinburgh, UK, June 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. T. Ridge. Verifying distributed systems: The operational approach. In Proceedings of the 36th ACM Symposium on Principles of Programming Languages (POPL), pages 429–440, Savannah, GA, Jan. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. D. Sangiorgi and D. Walker. PI-Calculus: A Theory of Mobile Processes. Cambridge University Press, New York, NY, USA, 2001. ISBN 0521781779. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. N. Schiper, V. Rahli, R. van Renesse, M. Bickford, and R. L. Constable. Developing correctly replicated databases using formal tools. In Proceedings of the 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pages 395–406, Atlanta, GA, June 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Verdi. Verdi, 2014. https://github.com/uwplse/verdi.Google ScholarGoogle Scholar
  39. M. Yabandeh, N. Kneževi´c, D. Kosti´c, and V. Kuncak. CrystalBall: Predicting and preventing inconsistencies in deployed distributed systems. In Proceedings of the 5th Symposium on Networked Systems Design and Implementation (NSDI), pages 229–244, San Francisco, CA, Apr. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. J. Yang and C. Hawblitzel. Safe to the last instruction: Automated verification of a type-safe operating system. In Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 99–110, Toronto, Canada, June 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. J. Yang, T. Chen, M. Wu, Z. Xu, X. Liu, H. Lin, M. Yang, F. Long, L. Zhang, and L. Zhou. M O D IST : Transparent model checking of unmodified distributed systems. In Proceedings of the 6th Symposium on Networked Systems Design and Implementation (NSDI), pages 213–228, Boston, MA, Apr. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. X. Yang, Y. Chen, E. Eide, and J. Regehr. Finding and understanding bugs in C compilers. In Proceedings of the 2011 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 283–294, San Jose, CA, June 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. D. Yuan, Y. Luo, X. Zhuang, G. R. Rodrigues, X. Zhao, Y. Zhang, P. U. Jain, and M. Stumm. Simple testing can prevent most critical failures: An analysis of production failures in distributed data-intensive systems. In Proceedings of the 11th Symposium on Operating Systems Design and Implementation (OSDI), pages 249–265, Broomfield, CO, Oct. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. P. Zave. Using lightweight modeling to understand Chord. ACM SIGCOMM Computer Communication Review, 42(2):49–57, Apr. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Verdi: a framework for implementing and formally verifying distributed systems

                  Recommendations

                  Comments

                  Login options

                  Check if you have access through your login credentials or your institution to get full access on this article.

                  Sign in

                  Full Access

                  • Published in

                    cover image ACM SIGPLAN Notices
                    ACM SIGPLAN Notices  Volume 50, Issue 6
                    PLDI '15
                    June 2015
                    630 pages
                    ISSN:0362-1340
                    EISSN:1558-1160
                    DOI:10.1145/2813885
                    • Editor:
                    • Andy Gill
                    Issue’s Table of Contents
                    • cover image ACM Conferences
                      PLDI '15: Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation
                      June 2015
                      630 pages
                      ISBN:9781450334686
                      DOI:10.1145/2737924

                    Copyright © 2015 Owner/Author

                    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

                    Publisher

                    Association for Computing Machinery

                    New York, NY, United States

                    Publication History

                    • Published: 3 June 2015

                    Check for updates

                    Qualifiers

                    • research-article

                  PDF Format

                  View or Download as a PDF file.

                  PDF

                  eReader

                  View online with eReader.

                  eReader