skip to main content
10.1145/2723372.2751519acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

On the Design and Scalability of Distributed Shared-Data Databases

Published: 27 May 2015 Publication History

Abstract

Database scale-out is commonly implemented by partitioning data across several database instances. This approach, however, has several restrictions. In particular, partitioned databases are inflexible in large-scale deployments and assume a partition-friendly workload in order to scale. In this paper, we analyze an alternative architecture design for distributed relational databases that overcomes the limitations of partitioned databases. The architecture is based on two fundamental principles: We decouple query processing and transaction management from data storage, and we share data across query processing nodes. The combination of these design choices provides scalability, elasticity, and operational flexibility without making any assumptions on the workload. As a drawback, sharing data among multiple database nodes causes synchronization overhead. To address this limitation, we introduce techniques for scalable transaction processing in shared-data environments. Specifically, we describe mechanisms for efficient data access, concurrency control, and data buffering. In combination with new hardware trends, the techniques enable performance characteristics that top state-of-the-art partitioned databases.

References

[1]
M. Aguilera, A. Merchant, M. Shah, A. Veitch, and C. Karamanolis. Sinfonia: a new paradigm for building scalable distributed systems. SOSP'07, pages 159--174, 2007.
[2]
Apache Hadoop. http://hadoop.apache.org/. Nov. 06, 2014.
[3]
J. Baker, C. Bond, J. Corbett, J. Furman, A. Khorlin, J. Larson, J.-M. Léon, Y. Li, A. Lloyd, and V. Yushprakh. Megastore: providing scalable, highly available storage for interactive services. CIDR'11, pages 223--234, 2011.
[4]
H. Berenson, P. Bernstein, J. Gray, J. Melton, E. O'Neil, and P. O'Neil. A critique of ANSI SQL isolation levels. SIGMOD'95, pages 1--10, 1995.
[5]
P. Bernstein and N. Goodman. Concurrency control in distributed database systems. ACM Comput. Surv., 13(2):185--221, 1981.
[6]
P. Bernstein, V. Hadzilacos, and N. Goodman. Concurrency control and recovery in database systems. Addison-Wesley, 1987.
[7]
P. Bernstein, C. Reid, and S. Das. Hyder - a transactional record manager for shared flash. CIDR'11, pages 9--20, 2011.
[8]
M. Brantner, D. Florescu, D. Graf, D. Kossmann, and T. Kraska. Building a database on S3. SIGMOD'08, pages 251--264, 2008.
[9]
M. Cahill, U. Röhm, and A. Fekete. Serializable isolation for snapshot databases. SIGMOD'08, pages 729--738, 2008.
[10]
D. Campbell, G. Kakivaya, and N. Ellis. Extreme scale with full SQL language support in microsoft SQL Azure. SIGMOD'10, pages 1021--1024, 2010.
[11]
S. Chandrasekaran and R. Bamford. Shared cache - the future of parallel databases. ICDE'03, 2003.
[12]
F. Chang, J. Dean, S. Ghemawat, W. Hsieh, D. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. Gruber. Bigtable: a distributed storage system for structured data. ACM Trans. Comput. Syst., 26(2):4:1--4:26, 2008.
[13]
J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. J. Furman, S. Ghemawat, A. Gubarev, C. Heiser, P. Hochschild, W. Hsieh, S. Kanthak, E. Kogan, H. Li, A. Lloyd, S. Melnik, D. Mwaura, D. Nagle, S. Quinlan, R. Rao, L. Rolig, Y. Saito, M. Szymaniak, C. Taylor, R. Wang, and D. Woodford. Spanner: Google's globally-distributed database. OSDI'12, pages 251--264, 2012.
[14]
S. Das, D. Agrawal, and A. El Abbadi. G-Store: a scalable data store for transactional multi key access in the cloud. SoCC'10, pages 163--174, 2010.
[15]
S. Das, D. Agrawal, and A. El Abbadi. ElasTraS: an elastic, scalable, and self-managing transactional database for the cloud. ACM Trans. Database Syst., 38(1):5:1--5:45, 2013.
[16]
G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's highly available key-value store. SOSP'07, pages 205--220, 2007.
[17]
D. DeWitt and J. Gray. Parallel database systems: the future of high performance database systems. Commun. ACM, 35(6):85--98, 1992.
[18]
A. Fekete, D. Liarokapis, E. O'Neil, P. O'Neil, and D. Shasha. Making snapshot isolation serializable. ACM Trans. Database Syst., 30(2):492--528, 2005.
[19]
FoundationDB. https://foundationdb.com/. Feb. 07, 2015.
[20]
D. Gomez Ferro, F. Junqueira, I. Kelly, B. Reed, and M. Yabandeh. Omid: lock-free transactional support for distributed data stores. ICDE'14, pages 676--687, 2014.
[21]
G. Graefe. A survey of B-tree locking techniques. ACM Trans. Database Syst., 35(3):16:1--16:26, 2010.
[22]
J. Gray and A. Reuter. Transaction processing: concepts and techniques. Morgan Kaufmann, 1992.
[23]
T. Horikawa. Latch-free data structures for DBMS: design, implementation, and evaluation. SIGMOD'13, pages 409--420, 2013.
[24]
InfiniBand. http://www.infinibandta.org/. Nov. 06, 2014.
[25]
E. Jensen, G. Hagensen, and J. Broughton. A new approach to exclusive data access in shared memory multiprocessors. Technical Report UCRL-97663, 1987.
[26]
E. P. Jones, D. J. Abadi, and S. Madden. Low overhead concurrency control for partitioned main memory databases. SIGMOD'10, pages 603--614, 2010.
[27]
S. Jorwekar, A. Fekete, K. Ramamritham, and S. Sudarshan. Automating the detection of snapshot isolation anomalies. VLDB'07, pages 1263--1274, 2007.
[28]
J. Josten, C. Mohan, I. Narang, and J. Teng. DB2's use of the coupling facility for data sharing. IBM Systems Journal, 36(2):327--351, 1997.
[29]
R. Kallman, H. Kimura, J. Natkins, A. Pavlo, A. Rasin, S. Zdonik, E. Jones, S. Madden, M. Stonebraker, Y. Zhang, J. Hugg, and D. Abadi. H-store: a high-performance, distributed main memory transaction processing system. Proc. VLDB Endow., 1(2):1496--1499, 2008.
[30]
A. Kemper and T. Neumann. HyPer: a hybrid OLTP & OLAP main memory database system based on virtual memory snapshots. ICDE'11, pages 195--206, 2011.
[31]
H. T. Kung and J. T. Robinson. On optimistic methods for concurrency control. ACM Trans. Database Syst., 6(2):213--226, 1981.
[32]
P.-A. Larson, S. Blanas, C. Diaconu, C. Freedman, J. M. Patel, and M. Zwilling. High-performance concurrency control mechanisms for main-memory databases. Proc. VLDB Endow., 5(4):298--309, 2011.
[33]
P. Lehman and S. B. Yao. Efficient locking for concurrent operations on B-trees. ACM Trans. Database Syst., 6(4):650--670, 1981.
[34]
J. Levandoski, D. Lomet, M. Mokbel, and K. Zhao. Deuteronomy: transaction support for cloud data. CIDR'11, pages 123--133, 2011.
[35]
J. Levandoski, D. Lomet, and S. Sengupta. The Bw-tree: a B-tree for new hardware platforms. ICDE'13, pages 302--313, 2013.
[36]
D. Lomet, R. Anderson, T. Rengarajan, and P. Spiro. How the Rdb/VMS data sharing system became fast. Technical Report CRL 92/4, 1992.
[37]
D. Lomet, A. Fekete, G. Weikum, and M. Zwilling. Unbundling transaction services in the cloud. CIDR'09, 2009.
[38]
M. Mages. ABA prevention using single-word instructions. Technical Report RC23089 (W0401-136), 2004.
[39]
C. Mohan. History repeats itself: sensible and Nonsen aspects of the NoSQL hoopla. EDBT'13, pages 11--16, 2013.
[40]
T. Mühlbauer, W. Rödiger, A. Reiser, A. Kemper, and T. Neumann. ScyPer: elastic OLAP throughput on transactional data. DanaC'13, pages 11--15, 2013.
[41]
MySQL Cluster. http://www.mysql.com/products/cluster/. Nov. 06, 2014.
[42]
D. Ongaro, S. Rumble, R. Stutsman, J. Ousterhout, and M. Rosenblum. Fast crash recovery in RAMCloud. SOSP'11, pages 29--41, 2011.
[43]
J. Ousterhout, P. Agrawal, D. Erickson, C. Kozyrakis, J. Leverich, D. Mazières, S. Mitra, A. Narayanan, D. Ongaro, G. Parulkar, M. Rosenblum, S. Rumble, E. Stratmann, and R. Stutsman. The case for RAMCloud. Commun. ACM, 54(7):121--130, 2011.
[44]
V. Raman, G. Swart, L. Qiao, F. Reiss, V. Dialani, D. Kossmann, I. Narang, and R. Sidle. Constant-time query processing. ICDE'08, pages 60--69, 2008.
[45]
J. Rao, E. Shekita, and S. Tata. Using paxos to build a scalable, consistent, and highly available datastore. Proc. VLDB Endow., 4(4):243--254, 2011.
[46]
W. Ronald. A technical overview of the Oracle Exadata database machine and Exadata storage server. Technical Report Oracle White Paper, 2012.
[47]
M. Rys. Scalable SQL. Commun. ACM, 54(6):48--53, 2011.
[48]
M. Serafini, E. Mansour, A. Aboulnaga, K. Salem, T. Rafiq, and U. F. Minhas. Accordion: Elastic scalability for database systems supporting distributed transactions. Proc. VLDB Endow., 7(12), 2014.
[49]
J. Shute, R. Vingralek, B. Samwel, et al. F1: a distributed SQL database that scales. Proc. VLDB Endow., 6(11):1068--1079, 2013.
[50]
A. Singhal, R. Van der Wijngaart, and P. Barry. Atomic read modify write primitives for I/O devices. Technical Report Intel White Paper, 2008.
[51]
G. H. Sockut and B. R. Iyer. Online reorganization of databases. ACM Comput. Surv., 41(3):14:1--14:136, 2009.
[52]
M. Stonebraker and R. Cattell. 10 rules for scalable performance in 'simple operation' datastores. Commun. ACM, 54(6):72--80, 2011.
[53]
M. Stonebraker, S. Madden, D. J. Abadi, S. Harizopoulos, N. Hachem, and P. Helland. The end of an architectural era: (it's time for a complete rewrite). VLDB'07, pages 1150--1160, 2007.
[54]
M. Stonebraker and A. Weisberg. The VoltDB main memory DBMS. IEEE Data Eng. Bull., 36(2):21--27, 2013.
[55]
R. Taft, E. Mansour, M. Serafini, J. Duggan, A. J. ElmoreA, A. Aboulnaga, A. Pavlo, and M. Stonebraker. E-store: Fine-grained elastic partitioning for distributed transaction processing systems. Proc. VLDB Endow., 8(3), 2014.
[56]
A. Thomson, T. Diamond, S.-C. Weng, K. Ren, P. Shao, and D. J. Abadi. Calvin: fast distributed transactions for partitioned database systems. SIGMOD'12, pages 1--12, 2012.
[57]
Transaction Processing Performance Council (TPC). TPC Benchmark C Specification ver. 5.11, 2010.
[58]
S. Tu, W. Zheng, E. Kohler, B. Liskov, and S. Madden. Speedy transactions in multicore in-memory databases. SOSP'13, pages 18--32, 2013.
[59]
P. Unterbrunner, G. Giannikis, G. Alonso, D. Fauser, and D. Kossmann. Predictable performance for unpredictable workloads. Proc. VLDB Endow., 2(1):706--717, 2009.
[60]
VoltDB. http://www.voltdb.com/. Nov. 06, 2014.
[61]
M. Yabandeh and D. Gómez Ferro. A critique of snapshot isolation. EuroSys'12, pages 155--168, 2012.

Cited By

View all
  • (2025)Synchronizing Disaggregated Data Structures with One-Sided RDMA: Pitfalls, Experiments and Design GuidelinesACM Transactions on Database Systems10.1145/3716377Online publication date: 14-Feb-2025
  • (2025)Scalable Data Management on Next-Generation Data Center NetworksScalable Data Management for Future Hardware10.1007/978-3-031-74097-8_8(199-221)Online publication date: 24-Jan-2025
  • (2024)Hybrid Shared-Buffer for Multi-Master DatabasesJournal of Database Management10.4018/JDM.35692035:1(1-27)Online publication date: 7-Nov-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data
May 2015
2110 pages
ISBN:9781450327589
DOI:10.1145/2723372
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 May 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. decoupled storage
  2. optimistic concurrency control
  3. shared-data
  4. transaction processing

Qualifiers

  • Research-article

Conference

SIGMOD/PODS'15
Sponsor:
SIGMOD/PODS'15: International Conference on Management of Data
May 31 - June 4, 2015
Victoria, Melbourne, Australia

Acceptance Rates

SIGMOD '15 Paper Acceptance Rate 106 of 415 submissions, 26%;
Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)85
  • Downloads (Last 6 weeks)10
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Synchronizing Disaggregated Data Structures with One-Sided RDMA: Pitfalls, Experiments and Design GuidelinesACM Transactions on Database Systems10.1145/3716377Online publication date: 14-Feb-2025
  • (2025)Scalable Data Management on Next-Generation Data Center NetworksScalable Data Management for Future Hardware10.1007/978-3-031-74097-8_8(199-221)Online publication date: 24-Jan-2025
  • (2024)Hybrid Shared-Buffer for Multi-Master DatabasesJournal of Database Management10.4018/JDM.35692035:1(1-27)Online publication date: 7-Nov-2024
  • (2023)Design Guidelines for Correct, Efficient, and Scalable Synchronization using One-Sided RDMAProceedings of the ACM on Management of Data10.1145/35892761:2(1-26)Online publication date: 20-Jun-2023
  • (2022)A study of database performance sensitivity to experiment settingsProceedings of the VLDB Endowment10.14778/3523210.352322115:7(1439-1452)Online publication date: 1-Mar-2022
  • (2022)EFA: A Viable Alternative to RDMA over InfiniBand for DBMSs?Proceedings of the 18th International Workshop on Data Management on New Hardware10.1145/3533737.3538506(1-5)Online publication date: 12-Jun-2022
  • (2022)In-memory transaction processing: efficiency and scalability considerationsKnowledge and Information Systems10.1007/s10115-019-01340-761:3(1209-1240)Online publication date: 11-Mar-2022
  • (2021)FoundationDB: A Distributed Unbundled Transactional Key Value StoreProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457559(2653-2666)Online publication date: 9-Jun-2021
  • (2021)Nova-LSM: A Distributed, Component-based LSM-tree Key-value StoreProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457297(749-763)Online publication date: 9-Jun-2021
  • (2021)An RDBMS-only architecture for web applications2021 XLVII Latin American Computing Conference (CLEI)10.1109/CLEI53233.2021.9640017(1-9)Online publication date: 25-Oct-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media