ABSTRACT
Currently, users of geo-distributed storage systems face a hard choice between having serializable transactions with high latency, or limited or no transactions with low latency. We show that it is possible to obtain both serializable transactions and low latency, under two conditions. First, transactions are known ahead of time, permitting an a priori static analysis of conflicts. Second, transactions are structured as transaction chains consisting of a sequence of hops, each hop modifying data at one server. To demonstrate this idea, we built Lynx, a geo-distributed storage system that offers transaction chains, secondary indexes, materialized join views, and geo-replication. Lynx uses static analysis to determine if each hop can execute separately while preserving serializability---if so, a client needs wait only for the first hop to complete, which occurs quickly. To evaluate Lynx, we built three applications: an auction service, a Twitter-like microblogging site and a social networking site. These applications successfully use chains to achieve low latency operation and good throughput.
Supplemental Material
- http://rubis.ow2.org/index.html as of Oct 2010.Google Scholar
- Apache cassandra database. http://cassandra.apache.org/.Google Scholar
- Hbase: Hadoop database. http://hbase.apache.org.Google Scholar
- MongoDB. http://www.mongodb.com.Google Scholar
- P. Agrawal, A. Silberstein, B. Cooper, U. Srivastava, and R. Ramakrishnan. Asynchronous view maintenance for VLSD databases. In International Conference on Management of Data, June 2009. Google ScholarDigital Library
- M. K. Aguilera, A. Merchant, M. Shah, A. Veitch, and C. Karamanolis. Sinfonia: A new paradigm for building scalable distributed systems. ACM Transactions on Computer Systems, 27(3):5:1--5:48, Nov. 2009. Google ScholarDigital Library
- C. Amza, A. Chanda, A. Cox, S. Elnikety, R. Gil, K. Rajamani, W. Zwaenepoel, E. Cecchet, and J. Marguerite. Specification and implementation of dynamic Web site benchmarks. In IEEE International Workshop on Workload Characterization, Nov. 2002.Google ScholarCross Ref
- P. Bailis, A. Fekete, A. Ghodsi, J. M. Hellerstein, and I. Stoica. The potential dangers of causal consistency and an explicit solution. In ACM Symposium on Cloud Computing, Oct. 2012. Google ScholarDigital Library
- P. Bailis, A. Fekete, A. Ghodsi, J. M. Hellerstein, and I. Stoica. HAT, not CAP: Highly available transactions. In Workshop on Hot Topics in Operating Systems, May 2013. Google ScholarDigital Library
- J. Baker, C. Bond, J. Corbett, J. Furman, A. Khorlin, J. Larson, J.-M. Leon, Y. Li, A. Lloyd, and V. Yushprakh. Megastore: Providing scalable, highly available storage for interactive services. In Conference on Innovative Database Systems Research, Jan. 2011.Google Scholar
- A. J. Bernstein, D. S. Gerstl, and P. M. Lewis. Concurrency control for step-decomposed transactions. Information Systems, 24(9):673--698, Dec. 1999. Google ScholarDigital Library
- P. Bernstein and N. Goodman. Concurrency control in distributed database systems. ACM Computing Survey, 13(2):185--221, June 1981. Google ScholarDigital Library
- J. Blakeley, P. Larson, and F. Tompa. Efficiently updating materialized views. In International Conference on Management of Data, May 1986. Google ScholarDigital Library
- W. Bolosky, D. Bradshaw, R. Haagens, N. Kusters, and P. Li. Paxos replicated state machines as the basis of a high-performance data store. In Symposium on Networked Systems Design and Implementation, Mar. 2011. Google ScholarDigital Library
- H. Boral, W. Alexander, L. Clay, G. Copeland, S. Danforth, M. Franklin, B. Hart, M. Smith, and P. Valduriez. Prototyping Bubba, a highly available parallel database system. Transactions on Knowledge and Data Engineering, 2(1):4--24, Mar. 1990. Google ScholarDigital Library
- Y. Breitbart, H. Garcia-Molina, and A. Silberschatz. Overview of multidatabase transaction management. The VLDB Journal, 1(2):181--239, Oct. 1992. Google ScholarDigital Library
- M. Cafarella, E. Chang, A. Fikes, A. Halevy, W. Hsieh, A. Lerner, J. Madhavan, and S. Muthukrishnan. Data management projects at Google. In International Conference on Management of Data, June 2008.Google ScholarDigital Library
- F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data. In Symposium on Operating Systems Design and Implementation, Nov. 2006. Google ScholarDigital Library
- J. Cipar, G. Ganger, K. Keeton, C. B. Morrey, III, C. A. N. Soules, and A. Veitch. LazyBase: Trading freshness for performance in a scalable database. In European Conference on Computer Systems, Apr. 2012. Google ScholarDigital Library
- L. Colby, T. Griffin, L. Libkin, I. Mumick, and H. Trickey. Algorithms for deferred view maintenance. In International Conference on Management of Data, June 1996. Google ScholarDigital Library
- B. F. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H.-A. Jacobsen, N. Puz, D. Weaver, and R. Yerneni. PNUTS: Yahoo!'s hosted data serving platform. 1(2):1277--1288, Aug. 2008. Google ScholarDigital Library
- J. Corbett et al. Spanner: Google's globally-distributed database. In Symposium on Operating Systems Design and Implementation, Oct. 2012. Google ScholarDigital Library
- S. Davies. Data processing spheres of control. IBM Systems Journal, 17(2):179--198, June 1978. Google ScholarDigital Library
- G. DeCandia et al. Dynamo: Amazon's highly available key-value store. In ACM Symposium on Operating Systems Principles, Oct. 2007. Google ScholarDigital Library
- D. Dewitt, S. Ghandeharizadeh, D. Schneider, A. Bricker, H. i Hsiao, and R. Rasmussen. The Gamma database machine project. Transactions on Knowledge and Data Engineering, 2(1):44--62, Mar. 1990. Google ScholarDigital Library
- D. DeWitt and J. Gray. Parallel database systems: The future of high performance database systems. Communications of the ACM, 35(6):85--98, June 1992. Google ScholarDigital Library
- R. Escriva, B. Wong, and E. G. Sirer. HyperDex: A distributed, searchable key-value store for cloud computing. In ACM SIGCOMM Conference, Aug. 2012. Google ScholarDigital Library
- M. Franklin. Concurrency control and recovery. The Computer Science and Engineering Handbook, pages 1058--1077, 1997.Google Scholar
- H. Garcia-Molina. Using semantic knowledge for transaction processing in a distributed database. ACM Transactions on Database Systems, 8(2):186--213, June 1983. Google ScholarDigital Library
- H. Garcia-Molina and K. Salem. SAGAS. In International Conference on Management of Data, May 1987. Google ScholarDigital Library
- S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google file system. In ACM Symposium on Operating Systems Principles, Oct. 2003. Google ScholarDigital Library
- D. K. Gifford. Information storage in a decentralized computer system. Technical Report CSL-81-8, Xerox Parc, Mar. 1982. Extended version of the Ph.D. thesis of D. K. Gifford. Google ScholarDigital Library
- J. Gray and A. Reuter. Transaction Processing: Concepts and Techniques. Morgan Kaufmann Publishers Inc., San Francisco, USA, 1993. Google ScholarDigital Library
- H. He, J. Xie, J. Yang, and H. Yu. Asymmetric batch incremental view maintenance. In International Conference on Data Engineering, Apr. 2005. Google ScholarDigital Library
- E. P. C. Jones, D. J. Abadi, and S. Madden. Low overhead concurrency control for partitioned main memory databases. In International Conference on Management of Data, June 2010. Google ScholarDigital Library
- N. Kallen. Big data in real time at Twitter. QCon: The annual international software development conference, Nov. 2010. http://www.slideshare.net/nkallen/q-con-3770885.Google Scholar
- T. Kraska, G. Pang, M. Franklin, S. Madden, and A. Fekete. MDCC: Multi-data center consistency. In European Conference on Computer Systems, Apr. 2013. Google ScholarDigital Library
- C. Li, D. Porto, A. Clement, R. Rodrigues, N. Preguia, and J. Gehrke. Making geo-replicated systems fast if possible, consistent when necessary. In Symposium on Operating Systems Design and Implementation, Oct. 2012. Google ScholarDigital Library
- W. Lloyd, M. Freedman, M. Kaminsky, and D. Andersen. Don't settle for eventual: Stronger consistency for wide-area storage with COPS. In ACM Symposium on Operating Systems Principles, Oct. 2011. Google ScholarDigital Library
- W. Lloyd, M. Freedman, M. Kaminsky, and D. Andersen. Stronger semantics for low-latency geo-replicated storage. In Symposium on Networked Systems Design and Implementation, Apr. 2013. Google ScholarDigital Library
- H. Mahmoud, F. Nawab, A. Pucher, D. Agrawal, and A. E. Abbadi. Low-latency multi-datacenter databases using replicated commit. Proceedings of the VLDB Endowment, 6(9):661--672, July 2013.Google ScholarDigital Library
- C. Mohan, B. Lindsay, and R. Obermarck. Transaction management in the R* distributed database management system. ACM Transactions on Database Systems, 11(4):378--396, Dec. 1986. Google ScholarDigital Library
- D. Peng and F. Dabek. Incremental processing of large data sets. In Symposium on Operating Systems Design and Implementation, Oct. 2010.Google Scholar
- D. Pritchett. BASE: An acid alternative. ACM Queue, 6(3):48--55, May 2008. Google ScholarDigital Library
- D. Quass and J. Widom. On-line warehouse view maintenance. In International Conference on Management of Data, May 1997. Google ScholarDigital Library
- E. Schurman and J. Brutlag. The user and business impact of server delays, additional bytes, and HTTP chunking in web search. In Velocity Web Performance and Operations Conference, June 2009.Google Scholar
- M. Shapiro, N. Preguiça, C. Baquero, and M. Zawirski. Conflict-free replicated data types. In International Symposium on Stabilization, Safety, and Security of Distributed Systems, Oct. 2011. Google ScholarDigital Library
- D. Shasha, F. Llirbat, E. Simon, and P. Valduriez. Transaction chopping: Algorithms and performance studies. ACM Transactions on Database Systems, 20(3):325--363, Sept. 1995. Google ScholarDigital Library
- A. Silberstein, J. Terrace, B. Cooper, and R. Ramakrishnan. Feeding frenzy: Selectively materializing users' event feeds. In International Conference on Management of Data, June 2010. Google ScholarDigital Library
- Y. Sovran, R. Power, M. K. Aguilera, and J. Li. Transactional storage for geo-replicated systems. In ACM Symposium on Operating Systems Principles, Oct. 2011. Google ScholarDigital Library
- J. Stribling, Y. Sovran, I. Zhang, X. Pretzer, J. Li, F. Kaashoek, and R. Morris. Simplifying wide-area application development with WheelFS. In Symposium on Networked Systems Design and Implementation, Apr. 2009.Google Scholar
- D. Terry, A. Demers, K. Petersen, M. Spreitzer, M. Theimer, and B. Welch. Session guarantees for weakly consistent replicated data. In International Conference on Parallel and Distributed Information Systems, Sept. 1994. Google ScholarDigital Library
- A. Thomson and D. J. Abadi. The case for determinism in database systems. Proceedings of the VLDB Endowment, 3(1):70--80, Sept. 2010. Google ScholarDigital Library
- G. Weikum and G. Vossen. Transactional information systems: theory, algorithms, and the practice of concurrency control and recovery. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2001. Google ScholarDigital Library
- J. Zhou, P. Larson, and H. Elmongui. Lazy maintenance of materialized views. In International Conference on Very Large Data Bases, Sept. 2007. Google ScholarDigital Library
Recommendations
Transaction chopping: algorithms and performance studies
Chopping transactions into pieces is good for performance but may lead to nonserializable executions. Many researchers have reacted to this fact by either inventing new concurrency-control mechanisms, weakening serializability, or both. We adopt a ...
Transaction communicators: enabling cooperation among concurrent transactions
PPoPP '11In this paper, we propose to extend transactional memory with transaction communicators, special objects through which concurrent transactions can communicate: changes by one transaction to a communicator can be seen by concurrent transactions before ...
A read-only transaction anomaly under snapshot isolation
Snapshot Isolation (SI), is a multi-version concurrency control algorithm introduced in [BBGMOO95] and later implemented by Oracle. SI avoids many concurrency errors, and it never delays read-only transactions. However it does not guarantee ...
Comments