ABSTRACT
Efficient transaction management is a delicate task. As systems face transactions of inherently different types, ranging from point updates to long-running analytical queries, it is hard to satisfy their requirements with a single execution engine. Unfortunately, most systems rely on such a design that implements its parallelism using multi-version concurrency control. While MVCC parallelizes short-running OLTP transactions well, it struggles in the presence of mixed workloads containing long-running OLAP queries, as scans have to work their way through vast amounts of versioned data. To overcome this problem, we reintroduce the concept of hybrid processing and combine it with state-of-the-art MVCC: OLAP queries are outsourced to run on separate virtual snapshots while OLTP transactions run on the most recent version of the database. Inside both execution engines, we still apply MVCC.
The most significant challenge of a hybrid approach is to generate the snapshots at a high frequency. Previous approaches heavily suffered from the high cost of snapshot creation. In our approach termed AnKer, we follow the current trend of co-designing underlying system components and the DBMS, to overcome the restrictions of the OS by introducing a custom system call vm_snapshot. It allows fine-granular snapshot creation that is orders of magnitudes faster than state-of-the-art approaches. Our experimental evaluation on an HTAP workload based on TPC-C transactions and OLAP queries show that our snapshotting mechanism is more than a factor of 100x faster than fork-based snapshotting and that the latency of OLAP queries is up to a factor of 4x lower than MVCC in a single execution engine. Besides, our approach enables a higher OLTP throughput than all state-of-the-art methods.
- 2017. MemSQL. (10 2017). http://www.memsql.comGoogle Scholar
- 2017. MySQL. (10 2017). http://www.mysql.comGoogle Scholar
- 2017. NuoDB: http://www.nuodb.com. (10 2017). http://www.nuodb.comGoogle Scholar
- 2017. Peloton: http://www.pelotondb.org. (10 2017). http://www.pelotondb.orgGoogle Scholar
- Philip A. Bernstein, Vassos Hadzilacos, and Nathan Goodman. 1987. Concurrency Control and Recovery in Database Systems. Addison-Wesley. http://research. microsoft.com/en-us/people/philbe/ccontrol.aspx Google ScholarDigital Library
- Cristian Diaconu, Craig Freedman, Erik Ismert, Per-Åke Larson, Pravin Mittal, Ryan Stonecipher, Nitin Verma, and Mike Zwilling. 2013. Hekaton: SQL server's memory-optimized OLTP engine. In SIGMOD 2013, New York, NY, USA, June 22--27, 2013. 1243--1254. Google ScholarDigital Library
- Franz Färber, Sang Kyun Cha, Jürgen Primsch, Christof Bornhövd, Stefan Sigg, and Wolfgang Lehner. 2011. SAP HANA database: data management for modern business applications. SIGMOD Record 40, 4 (2011), 45--51. Google ScholarDigital Library
- Alan Fekete, Elizabeth J. O'Neil, and Patrick E. O'Neil. 2004. A Read-Only Transaction Anomaly Under Snapshot Isolation. SIGMOD Record 33, 3 (2004), 12--14. Google ScholarDigital Library
- Jana Giceva, Gerd Zellweger, Gustavo Alonso, and Timothy Rosco. 2016. Customized OS Support for Data-processing. In DaMon' 16. ACM, New York, NY, USA, Article 2, 6 pages. Google ScholarDigital Library
- A. Kemper and T. Neumann. 2011. HyPer: A hybrid OLTP &OLAP main memory database system based on virtual memory snapshots. In ICDE 2011. 195--206. Google ScholarDigital Library
- Kangnyeon Kim, Tianzheng Wang, Ryan Johnson, and Ippokratis Pandis. 2016. ERMIA: Fast Memory-Optimized Database System for Heterogeneous Workloads. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26-July 01, 2016. 1675--1687. Google ScholarDigital Library
- Per-Åke Larson, Spyros Blanas, Cristian Diaconu, Craig Freedman, Jignesh M. Patel, and Mike Zwilling. 2011. High-Performance Concurrency Control Mechanisms for Main-Memory Databases. PVLDB 5, 4 (2011), 298--309. Google ScholarDigital Library
- Qingzhong Meng, Xuan Zhou, Shiping Chen, and Shan Wang. 2016. SwingDB: An Embedded In-memory DBMS Enabling Instant Snapshot Sharing. In ADMS/IMDM Workshop 2016. 134--149.Google Scholar
- C. Mohan, Hamid Pirahesh, and Raymond A. Lorie. 1992. Efficient and Flexible Methods for Transient Versioning of Records to Avoid Locking by Read-Only Transactions. In SIGMOD 1992. 124--133. Google ScholarDigital Library
- Henrik Mühe, Alfons Kemper, and Thomas Neumann. 2011. How to efficiently snapshot transactional data: hardware or software controlled?. In DaMoN 2011, Athens, Greece. 17--26. Google ScholarDigital Library
- Thomas Neumann, Tobias Mühlbauer, and Alfons Kemper. 2015. Fast Serializable Multi-Version Concurrency Control for Main-Memory Database Systems. In SIGMOD 2015. 677--689. Google ScholarDigital Library
- Muhsen Owaida, David Sidler, Kaan Kara, and Gustavo Alonso. 2017. Centaur: A Framework for Hybrid CPU-FPGA Databases. In FCCM 2017, Napa, CA, USA, April 30-May 2, 2017. 211--218.Google Scholar
- Dan R. K. Ports and Kevin Grittner. 2012. Serializable Snapshot Isolation in PostgreSQL. PVLDB 5, 12 (2012), 1850--1861. Google ScholarDigital Library
- Felix Martin Schuhknecht, Jens Dittrich, and Ankur Sharma. 2016. RUMA has it: Rewired User-space Memory Access is Possible! PVLDB 9, 10 (2016), 768--779. Google ScholarDigital Library
- Stephen Tu, Wenting Zheng, Eddie Kohler, Barbara Liskov, and Samuel Madden. 2013. Speedy transactions in multicore in-memory databases. In ACM SIGOPS 24th Symposium on Operating Systems Principles, SOSP '13, Farmington, PA, USA, November 3-6, 2013. 18--32. Google ScholarDigital Library
- Annett Ungethüm, Dirk Habich, Tomas Karnagel, Sebastian Haas, Eric Mier, Gerhard Fettweis, and Wolfgang Lehner. 2017. Overview on Hardware Optimizations for Database Engines. In BTW 2017, 6.-10. März 2017, Stuttgart, Germany, Proceedings. 383--402.Google Scholar
- Tianzheng Wang, Ryan Johnson, Alan Fekete, and Ippokratis Pandis. 2017. Efficiently making (almost) any concurrency control mechanism serializable. The VLDB Journal 26, 4 (01 Aug 2017), 537--562. Google ScholarDigital Library
- Tianzheng Wang and Hideaki Kimura. 2016. Mostly-Optimistic Concurrency Control for Highly Contended Dynamic Workloads on a Thousand Cores. PVLDB 10, 2 (2016), 49--60. http://www.vldb.org/pvldb/vol10/p49-wang.pdf Google ScholarDigital Library
- Gerhard Weikum and Gottfried Vossen. 2002. Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery. Morgan Kaufmann. Google ScholarDigital Library
- Yingjun Wu, Joy Arulraj, Jiexi Lin, Ran Xian, and Andrew Pavlo. 2017. An Empirical Evaluation of In-Memory Multi-Version Concurrency Control. PVLDB 10, 7 (2017), 781--792. Google ScholarDigital Library
- Xiangyao Yu, George Bezerra, Andrew Pavlo, Srinivas Devadas, and Michael Stonebraker. 2014. Staring into the Abyss: An Evaluation of Concurrency Control with One Thousand Cores. PVLDB 8, 3 (2014), 209--220. Google ScholarDigital Library
Index Terms
Accelerating Analytical Processing in MVCC using Fine-Granular High-Frequency Virtual Snapshotting
Recommendations
Diva: Making MVCC Systems HTAP-Friendly
SIGMOD '22: Proceedings of the 2022 International Conference on Management of DataMultiversion concurrency control (MVCC) and design principles thereof are ingrained in modern database management systems, thus promoting remarkable progress in managing online transaction processing (OLTP) workloads for decades. However, MVCC systems ...
SI-CV: snapshot isolation with co-located versions
TPCTC'11: Proceedings of the Third TPC Technology conference on Topics in Performance Evaluation, Measurement and CharacterizationSnapshot Isolation is an established concurrency control algorithm, where each transaction executes against its own version/snapshot of the database. Version management may produce unnecessary random writes. Compared to magnetic disks Flash storage ...
Fast Serializable Multi-Version Concurrency Control for Main-Memory Database Systems
SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of DataMulti-Version Concurrency Control (MVCC) is a widely employed concurrency control mechanism, as it allows for execution modes where readers never block writers. However, most systems implement only snapshot isolation (SI) instead of full ...
Comments