Abstract
Data replication is used in distributed systems to maintain up-to-date copies of shared data across multiple computers in a network. However, despite decades of research, algorithms for achieving consistency in replicated systems are still poorly understood. Indeed, many published algorithms have later been shown to be incorrect, even some that were accompanied by supposed mechanised proofs of correctness. In this work, we focus on the correctness of Conflict-free Replicated Data Types (CRDTs), a class of algorithm that provides strong eventual consistency guarantees for replicated data. We develop a modular and reusable framework in the Isabelle/HOL interactive proof assistant for verifying the correctness of CRDT algorithms. We avoid correctness issues that have dogged previous mechanised proofs in this area by including a network model in our formalisation, and proving that our theorems hold in all possible network behaviours. Our axiomatic network model is a standard abstraction that accurately reflects the behaviour of real-world computer networks. Moreover, we identify an abstract convergence theorem, a property of order relations, which provides a formal definition of strong eventual consistency. We then obtain the first machine-checked correctness theorems for three concrete CRDTs: the Replicated Growable Array, the Observed-Remove Set, and an Increment-Decrement Counter. We find that our framework is highly reusable, developing proofs of correctness for the latter two CRDTs in a few hours and with relatively little CRDT-specific code.
Supplemental Material
Available for Download
- Akka 2017. The Akka actor framework for the Java Virtual Machine. (2017). http://www.akka.io/ Accessed April 2017.Google Scholar
- Paulo Sérgio Almeida, Ali Shoker, and Carlos Baquero. 2015. Efficient State-Based CRDTs by Delta-Mutation. In International Conference on Networked Systems (NETYS) . Google ScholarCross Ref
- Manamiary Bruno Andriamiarina, Dominique Méry, and Neeraj Kumar Singh. 2014. Analysis of Self-⋆ and P2P Systems Using Refinement. In Abstract State Machines, Alloy, B, TLA, VDM, and Z - 4th International Conference, ABZ 2014, Toulouse, France, June 2-6, 2014. Proceedings . 117–123. Google ScholarDigital Library
- AppJet, Inc. 2011. Etherpad and EasySync Technical Manual. (March 2011). https://github.com/ether/etherpad-lite/blob/ e2ce9dc/doc/easysync/easysync-full-description.pdfGoogle Scholar
- Hagit Attiya, Sebastian Burckhardt, Alexey Gotsman, Adam Morrison, Hongseok Yang, and Marek Zawirski. 2016. Specification and Complexity of Collaborative Text Editing. In ACM Symposium on Principles of Distributed Computing (PODC). 259–268. Google ScholarDigital Library
- Hagit Attiya, Faith Ellen, and Adam Morrison. 2015. Limitations of Highly-Available Eventually-Consistent Data Stores. In ACM Symposium on Principles of Distributed Computing (PODC) . Google ScholarDigital Library
- Noran Azmy, Stephan Merz, and Christoph Weidenbach. 2016. A Rigorous Correctness Proof for Pastry. In Abstract State Machines, Alloy, B, TLA, VDM, and Z - 5th International Conference, ABZ 2016, Linz, Austria, May 23-27, 2016, Proceedings . 86–101. Google ScholarDigital Library
- Peter Bailis and Ali Ghodsi. 2013. Eventual Consistency Today: Limitations, Extensions, and Beyond. ACM Queue 11, 3 (March 2013). Google ScholarDigital Library
- Peter Bailis and Kyle Kingsbury. 2014. The Network is Reliable. ACM Queue 12, 7 (July 2014). Google ScholarDigital Library
- Carlos Baquero, Paulo Sérgio Almeida, and Carl Lerche. 2016. The problem with embedded CRDT counters and a solution. In 2nd Workshop on the Principles and Practice of Consistency for Distributed Data (PaPoC). Google ScholarDigital Library
- Carlos Baquero, Paulo Sérgio Almeida, and Ali Shoker. 2014. Making Operation-based CRDTs Operation-based. In 14th IFIP International Conference on Distributed Applications and Interoperable Systems (DAIS) . 126–140. Google ScholarDigital Library
- Annette Bieniusa, Marek Zawirski, Nuno Preguiça, Marc Shapiro, Carlos Baquero, Valter Balegas, and Sérgio Duarte. 2012a. Brief Announcement: Semantics of Eventually Consistent Replicated Sets. In 26th International Symposium on Distributed Computing (DISC) . Google ScholarDigital Library
- Annette Bieniusa, Marek Zawirski, Nuno Preguiça, Marc Shapiro, Carlos Baquero, Valter Balegas, and Sérgio Duarte. 2012b. An Optimized Conflict-free Replicated Set . Technical Report RR-8083. http://arxiv.org/abs/1210.3368Google Scholar
- Russell Brown, Sean Cribbs, Christopher Meiklejohn, and Sam Elliott. 2014. Riak DT map: a composable, convergent replicated dictionary. In 1st Workshop on Principles and Practice of Eventual Consistency (PaPEC). Google ScholarDigital Library
- Sebastian Burckhardt. 2014. Principles of Eventual Consistency. Foundations and Trends in Programming Languages 1, 1-2 (Oct. 2014), 1–150. Google ScholarDigital Library
- Sebastian Burckhardt, Manuel Fähndrich, Daan Leijen, and Benjamin P Wood. 2012. Cloud Types for Eventual Consistency. In 26th European Conference on Object-Oriented Programming (ECOOP). Google ScholarDigital Library
- Sebastian Burckhardt, Alexey Gotsman, Hongseok Yang, and Marek Zawirski. 2014. Replicated Data Types: Specification, Verification, Optimality. In 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL). 271–284. Google ScholarDigital Library
- Christian Cachin, Rachid Guerraoui, and Luís Rodrigues. 2011. Introduction to Reliable and Secure Distributed Programming (second ed.). Springer. Google ScholarCross Ref
- Bernadette Charron-Bost, Henri Debrat, and Stephan Merz. 2011. Formal Verification of Consensus Algorithms Tolerating Malicious Faults. In Stabilization, Safety, and Security of Distributed Systems - 13th International Symposium, SSS 2011, Grenoble, France, October 10-12, 2011. Proceedings . 120–134. Google ScholarCross Ref
- Susan B Davidson, Hector Garcia-Molina, and Dale Skeen. 1985. Consistency in Partitioned Networks. Comput. Surveys 17, 3 (Sept. 1985), 341–370. Google ScholarDigital Library
- Aguido Horatio Davis, Chengzheng Sun, and Junwei Lu. 2002. Generalizing Operational Transformation to the Standard General Markup Language. In ACM Conference on Computer Supported Cooperative Work (CSCW). 58–67. Google ScholarDigital Library
- John Day-Richter. 2010. What’s different about the new Google Docs: Making collaboration fast. (Sept. 2010). https: //drive.googleblog.com/2010/09/whats-different-about-new-google-docs.htmlGoogle Scholar
- Henri Debrat and Stephan Merz. 2012. Verifying Fault-Tolerant Distributed Algorithms in the Heard-Of Model. Archive of Formal Proofs 2012 (2012). https://www.isa-afp.org/entries/Heard_Of.shtmlGoogle Scholar
- Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: Amazon’s Highly Available Key-Value Store. In 21st ACM Symposium on Operating Systems Principles (SOSP). 205–220. Google ScholarDigital Library
- Clarence Ellis and S J Gibbs. 1989. Concurrency Control in Groupware Systems. In ACM International Conference on Management of Data (SIGMOD) . 399–407. Google ScholarDigital Library
- Colin J Fidge. 1988. Timestamps in message-passing systems that preserve the partial ordering. In 11th Australian Computer Science Conference . 56–66.Google Scholar
- Pedro Fonseca, Kaiyuan Zhang, Xi Wang, and Arvind Krishnamurthy. 2017. An Empirical Study on the Correctness of Formally Verified Distributed Systems. In Proceedings of the Twelfth European Conference on Computer Systems, EuroSys 2017, Belgrade, Serbia, April 23-26, 2017 . 328–343. Google ScholarDigital Library
- Victor B. F. Gomes, Martin Kleppmann, Dominic P. Mulligan, and Alastair R. Beresford. 2017. A framework for establishing Strong Eventual Consistency for Conflict-free Replicated Datatypes. Archive of Formal Proofs (July 2017). http: //isa-afp.org/entries/CRDT.shtml , Formal proof development.Google Scholar
- Florian Haftmann and Tobias Nipkow. 2010. Code Generation via Higher-Order Rewrite Systems. In Functional and Logic Programming, 10th International Symposium, FLOPS 2010, Sendai, Japan, April 19-21, 2010. Proceedings . 103–117. Google ScholarDigital Library
- Florian Haftmann and Makarius Wenzel. 2008. Local Theory Specifications in Isabelle/Isar. In Types for Proofs and Programs, International Conference, TYPES 2008, Torino, Italy, March 26-29, 2008, Revised Selected Papers . 153–168. Google ScholarDigital Library
- Claudia-Lavinia Ignat and Moira C Norrie. 2003. Customizable Collaborative Editor Relying on treeOPT Algorithm. In 8th European Conference on Computer-Supported Cooperative Work (ECSCW). 315–334. Google ScholarCross Ref
- Abdessamad Imine, Pascal Molli, Gérald Oster, and Michaël Rusinowitch. 2003. Proving Correctness of Transformation Functions in Real-Time Groupware. In 8th European Conference on Computer-Supported Cooperative Work (ECSCW). 277–293. Google ScholarCross Ref
- Abdessamad Imine, Michaël Rusinowitch, Gérald Oster, and Pascal Molli. 2006. Formal design and verification of operational transformation algorithms for copies convergence. Theoretical Computer Science 351, 2 (Feb. 2006), 167–183. Google ScholarDigital Library
- James E. Johnson, David E. Langworthy, Leslie Lamport, and Friedrich H. Vogt. 2004. Formal Specification of a Web Services Protocol. Electr. Notes Theor. Comput. Sci. 105 (2004), 147–158. Google ScholarDigital Library
- Tim Jungnickel and Tobias Herb. 2015. TP1-valid Transformation Functions for Operations on ordered n-ary Trees. arXiv:1512.05949. (Dec. 2015). https://arxiv.org/abs/1512.05949Google Scholar
- Florian Kammüller, Markus Wenzel, and Lawrence C. Paulson. 1999. Locales - A Sectioning Concept for Isabelle. In Theorem Proving in Higher Order Logics, 12th International Conference, TPHOLs’99, Nice, France, September, 1999, Proceedings . 149–166. Google ScholarCross Ref
- Kyle Kingsbury. 2013. Jepsen: Cassandra. (Sept. 2013). https://aphyr.com/posts/294-jepsen-cassandra Accessed April 2017.Google Scholar
- Martin Kleppmann and Alastair R Beresford. 2017. A Conflict-Free Replicated JSON Datatype. IEEE Transactions on Parallel and Distributed Systems (April 2017). Google ScholarDigital Library
- Leslie Lamport. 1978. Time, Clocks, and the Ordering of Events in a Distributed System. Commun. ACM 21, 7 (July 1978), 558–565. Google ScholarDigital Library
- João Leitão, José Pereira, and Luís Rodrigues. 2007. HyParView: A Membership Protocol for Reliable Gossip-Based Broadcast. In 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). 419–429. Google ScholarDigital Library
- Raph Levien. 2016. Towards a unified theory of Operational Transformation and CRDT. (July 2016). https://medium.com/ @raphlinus/towards-a-unified-theory-of-operational-transformation-and-crdt-70485876f72fGoogle Scholar
- Du Li and Rui Li. 2004. Preserving operation effects relation in group editors. In ACM Conference on Computer Supported Cooperative Work (CSCW) . 457–466. Google ScholarDigital Library
- Du Li and Rui Li. 2008. An Approach to Ensuring Consistency in Peer-to-Peer Real-Time Group Editors. Computer Supported Cooperative Work (CSCW) 17, 5 (Dec. 2008), 553–611. Google ScholarDigital Library
- Rui Li and Du Li. 2005. A landmark-based transformation approach to concurrency control in group editors. In International ACM SIGGROUP Conference on Supporting Group Work . 284–293. Google ScholarDigital Library
- Stéphane Martin, Pascal Urso, and Stéphane Weiss. 2010. Scalable XML Collaborative Editing with Undo. In On the Move to Meaningful Internet Systems (OTM) . 507–514. Google ScholarCross Ref
- Christopher Meiklejohn and Peter Van Roy. 2015. Lasp: a language for distributed, coordination-free programming. In Proceedings of the 17th International Symposium on Principles and Practice of Declarative Programming, Siena, Italy, July 14-16, 2015 . 184–195. Google ScholarDigital Library
- Brice Nédelec, Pascal Molli, and Achour Mostefaoui. 2016. CRATE: Writing Stories Together with our Browsers. In 25th International World Wide Web Conference (WWW) . 231–234. Google ScholarDigital Library
- Brice Nédelec, Pascal Molli, Achour Mostefaoui, and Emmanuel Desmontils. 2013. LSEQ: an Adaptive Structure for Sequences in Distributed Collaborative Editing. In 13th ACM Symposium on Document Engineering (DocEng). 37–46. Google ScholarDigital Library
- David A Nichols, Pavel Curtis, Michael Dixon, and John Lamping. 1995. High-Latency, Low-Bandwidth Windowing in the Jupiter Collaboration System. In 8th Annual ACM Symposium on User Interface Software and Technology (UIST). 111–120. Google ScholarDigital Library
- Tobias Nipkow and Gerwin Klein. 2014. Concrete Semantics - With Isabelle/HOL. Springer. Google ScholarCross Ref
- Gérald Oster, Pascal Molli, Pascal Urso, and Abdessamad Imine. 2006a. Tombstone Transformation Functions for Ensuring Consistency in Collaborative Editing Systems. In 2nd International Conference on Collaborative Computing (CollaborateCom) . Google ScholarCross Ref
- Gérald Oster, Pascal Urso, Pascal Molli, and Abdessamad Imine. 2005. Proving correctness of transformation functions in collaborative editing systems . Technical Report RR-5795. https://hal.inria.fr/inria-00071213/Google Scholar
- Gérald Oster, Pascal Urso, Pascal Molli, and Abdessamad Imine. 2006b. Data Consistency for P2P Collaborative Editing. In ACM Conference on Computer Supported Cooperative Work (CSCW) . Google ScholarDigital Library
- Nuno Preguiça, Joan Manuel Marquès, Marc Shapiro, and Mihai Letia. 2009. A commutative replicated data type for cooperative editing. In 29th IEEE International Conference on Distributed Computing Systems (ICDCS). Google ScholarDigital Library
- Aurel Randolph, Hanifa Boucheneb, Abdessamad Imine, and Alejandro Quintero. 2015. On Synthesizing a Consistent Operational Transformation Approach. IEEE Trans. Comput. 64, 4 (April 2015), 1074–1089. Google ScholarDigital Library
- Michel Raynal and Mukesh Singhal. 1996. Logical time: capturing causality in distributed systems. IEEE Computer 29, 2 (Feb. 1996), 49–56. Google ScholarDigital Library
- Matthias Ressel, Doris Nitsche-Ruhland, and Rul Gunzenhäuer. 1996. An Integrating, Transformation-Oriented Approach to Concurrency Control and Undo in Group Editors. In ACM Conference on Computer Supported Cooperative Work (CSCW). 288–297. Google ScholarDigital Library
- Hyun-Gul Roh, Myeongjae Jeon, Jin-Soo Kim, and Joonwon Lee. 2011. Replicated abstract data types: Building blocks for collaborative applications. J. Parallel and Distrib. Comput. 71, 3 (2011), 354–368. Google ScholarDigital Library
- Hyun-Gul Roh, Jin-Soo Kim, Joonwon Lee, and Seungryoul Maeng. 2009. Optimistic Operations for Replicated Abstract Data Types . Technical Report CS/TR-2009-318. KAIST.Google Scholar
- Reinhard Schwarz and Friedemann Mattern. 1994. Detecting Causal Relationships in Distributed Computations: In Search of the Holy Grail. Distributed Computing 7, 3 (March 1994), 149–174. Google ScholarDigital Library
- Marc Shapiro, Nuno Preguiça, Carlos Baquero, and Marek Zawirski. 2011a. A comprehensive study of Convergent and Commutative Replicated Data Types . Technical Report 7506. INRIA.Google Scholar
- Marc Shapiro, Nuno Preguiça, Carlos Baquero, and Marek Zawirski. 2011b. Conflict-free Replicated Data Types. In 13th International Symposium on Stabilization, Safety, and Security of Distributed Systems (SSS) . 386–400. Google ScholarCross Ref
- Justin Sheehy. 2015. There is No Now: Problems with simultaneity in distributed systems. ACM Queue 13, 3 (March 2015). Google ScholarDigital Library
- Sergey Sinchuk, Pavel Chuprikov, and Konstantin Solomatov. 2016. Verified Operational Transformation for Trees. In 7th International Conference on Interactive Theorem Proving (ITP) . Google ScholarCross Ref
- Daniel Spiewak. 2010. Understanding and Applying Operational Transformation. (May 2010). http://www.codecommit. com/blog/java/understanding-and-applying-operational-transformationGoogle Scholar
- Maher Suleiman, Michèle Cart, and Jean Ferrié. 1997. Serialization of concurrent operations in a distributed collaborative environment. In International Conference on Supporting Group Work (GROUP). 435–445. Google ScholarDigital Library
- Maher Suleiman, Michèle Cart, and Jean Ferrié. 1998. Concurrent operations in a distributed and mobile collaborative environment. In 14th International Conference on Data Engineering (ICDE). 36–45. Google ScholarCross Ref
- Chengzheng Sun and David Chen. 2002. Consistency Maintenance in Real-Time Collaborative Graphics Editing Systems. ACM Transactions on Computer-Human Interaction (TOCHI) 9, 1 (March 2002), 1–41. Google ScholarDigital Library
- Chengzheng Sun and Clarence Ellis. 1998. Operational Transformation in Real-Time Group Editors: Issues, Algorithms, and Achievements. In ACM Conference on Computer Supported Cooperative Work (CSCW). 59–68. Google ScholarDigital Library
- Chengzheng Sun, Xiaohua Jia, Yanchun Zhang, Yun Yang, and David Chen. 1998. Achieving Convergence, Causality Preservation, and Intention Preservation in Real-Time Cooperative Editing Systems. ACM Transactions on ComputerHuman Interaction (TOCHI) 5, 1 (1998), 63–108. Google ScholarDigital Library
- Douglas B Terry, Alan J Demers, Karin Petersen, Mike J Spreitzer, Marvin M Theimer, and Brent B Welch. 1994. Session Guarantees for Weakly Consistent Replicated Data. In 3rd International Conference on Parallel and Distributed Information Systems (PDIS) . 140–149. Google ScholarCross Ref
- Mohamed Tounsi, Mohamed Mosbah, and Dominique Méry. 2013. From Event-B Specifications to Programs for Distributed Algorithms. In 2013 Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises, Hammamet, Tunisia, June 17-20, 2013 . 104–109. Google ScholarDigital Library
- Mohamed Tounsi, Mohamed Mosbah, and Dominique Méry. 2016. From Event-B specifications to programs for distributed algorithms. IJAACS 9, 3/4 (2016), 223–242. Google ScholarDigital Library
- Nicolas Vidot, Michelle Cart, Jean Ferrié, and Maher Suleiman. 2000. Copies convergence in a distributed real-time collaborative environment. In ACM Conference on Computer Supported Cooperative Work (CSCW). 171–180. Google ScholarDigital Library
- Werner Vogels. 2009. Eventually consistent. Commun. ACM 52, 1 (Jan. 2009), 40–44. Google ScholarDigital Library
- David Wang, Alex Mah, Soren Lassen, and Sam Thorogood. 2015. Apache Wave (incubating) Protocol Documentation, Release 0.4. Apache Software Foundation. (Aug. 2015). https://people.apache.org/~al/wave_docs/ApacheWaveProtocol-0.4.pdfGoogle Scholar
- Stéphane Weiss, Pascal Urso, and Pascal Molli. 2010. Logoot-Undo: Distributed Collaborative Editing System on P2P networks. IEEE Transactions on Parallel and Distributed Systems 21, 8 (Jan. 2010), 1162–1174. Google ScholarDigital Library
- Makarius Wenzel, Lawrence C. Paulson, and Tobias Nipkow. 2008. The Isabelle Framework. In Theorem Proving in Higher Order Logics, 21st International Conference, TPHOLs 2008, Montreal, Canada, August 18-21, 2008. Proceedings . 33–38. Google ScholarDigital Library
- James R. Wilcox, Doug Woos, Pavel Panchekha, Zachary Tatlock, Xi Wang, Michael D. Ernst, and Thomas E. Anderson. 2015. Verdi: a framework for implementing and formally verifying distributed systems. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, Portland, OR, USA, June 15-17, 2015 . 357–368. Google ScholarDigital Library
- Peter Zeller, Annette Bieniusa, and Arnd Poetzsch-Heffter. 2014. Formal Specification and Verification of CRDTs. In 34th IFIP International Conference on Formal Techniques for Distributed Objects, Components and Systems (FORTE) . Google ScholarCross Ref
Index Terms
- Verifying strong eventual consistency in distributed systems
Recommendations
'Cause I'm strong enough: Reasoning about consistency choices in distributed systems
POPL '16: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming LanguagesLarge-scale distributed systems often rely on replicated databases that allow a programmer to request different data consistency guarantees for different operations, and thereby control their performance. Using such databases is far from trivial: ...
A Generic Replicated Data Type for Strong Eventual Consistency
PaPoC '19: Proceedings of the 6th Workshop on Principles and Practice of Consistency for Distributed DataConflict-free replicated data types (CRDTs) [7] aid programmers develop highly available and scalable distributed systems. However, CRDTs require operations to commute which is not practical. This means that programmers cannot replicate regular objects ...
Constraining the eventual in eventual consistency
PaPoC '18: Proceedings of the 5th Workshop on the Principles and Practice of Consistency for Distributed DataCRDTs are highly available replicated data structures which offer strong eventual consistency in the face of concurrent operations [3]. By their definition, CRDTs eventually converge to a consistent state given enough time. However, this is not strict ...
Comments