research-article

Public Access

A Low-cost Disk Solution Enabling LSM-tree to Achieve High Performance for Mixed Read/Write Workloads

Authors:
Dejun Teng

The Ohio State University

The Ohio State University
View Profile

,
Lei Guo

Google Inc.

Google Inc.
View Profile

,
Rubao Lee

The Ohio State University

The Ohio State University
View Profile

,
Feng Chen

Louisiana State University

Louisiana State University
View Profile

,
Yanfeng Zhang

Northeastern University, China

Northeastern University, China
View Profile

,
Siyuan Ma

The Ohio State University

The Ohio State University
View Profile

,
Xiaodong Zhang

The Ohio State University

The Ohio State University
View Profile

Authors Info & Claims

ACM Transactions on Storage Volume 14 Issue 2Article No.: 15pp 1–26https://doi.org/10.1145/3162615

Published:12 April 2018Publication History

ACM Transactions on Storage

Abstract

LSM-tree has been widely used in data management production systems for write-intensive workloads. However, as read and write workloads co-exist under LSM-tree, data accesses can experience long latency and low throughput due to the interferences to buffer caching from the compaction, a major and frequent operation in LSM-tree. After a compaction, the existing data blocks are reorganized and written to other locations on disks. As a result, the related data blocks that have been loaded in the buffer cache are invalidated since their referencing addresses are changed, causing serious performance degradations.

To re-enable high-speed buffer caching during intensive writes, we propose Log-Structured buffered-Merge tree (simplified as LSbM-tree) by adding a compaction buffer on disks to minimize the cache invalidations on buffer cache caused by compactions. The compaction buffer efficiently and adaptively maintains the frequently visited datasets. In LSbM, strong locality objects can be effectively kept in the buffer cache with minimum or no harmful invalidations. With the help of a small on-disk compaction buffer, LSbM achieves a high query performance by enabling effective buffer caching, while retaining all the merits of LSM-tree for write-intensive data processing and providing high bandwidth of disks for range queries. We have implemented LSbM based on LevelDB. We show that with a standard buffer cache and a hard disk, LSbM can achieve 2x performance improvement over LevelDB. We have also compared LSbM with other existing solutions to show its strong cache effectiveness.

References

Muhammad Yousuf Ahmad and Bettina Kemme. 2015. Compaction management in distributed key-value datastores. Proc. VLDB Endow. 8, 8 (2015), 850--861. Google ScholarDigital Library
Apache. 2017. Cassandra. Retrieved April 6, 2017 from http://cassandra.apache.org/.Google Scholar
Apache. 2017. HBASE. Retrieved April 6, 2017 from https://hbase.apache.org/.Google Scholar
Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. 2012. Workload analysis of a large-scale key-value store. In ACM SIGMETRICS Performance Evaluation Review, Vol. 40. ACM, 53--64. Google ScholarDigital Library
Basho. 2017. Riak. Retrieved April 6, 2017from http://basho.com/riak.Google Scholar
Andrei Broder and Michael Mitzenmacher. 2004. Network applications of bloom filters: A survey. Internet Math. 1, 4 (2004), 485--509.Google ScholarCross Ref
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. 2008. Bigtable: A distributed storage system for structured data. ACM Trans. Comput. Syst. 26, 2 (2008), 4. Google ScholarDigital Library
Feng Chen, Rubao Lee, and Xiaodong Zhang. 2011. Essential roles of exploiting internal parallelism of flash memory based solid state drives in high-speed data processing. In Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture (HPCA’11). IEEE, 266--277. Google ScholarDigital Library
Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing. ACM, 143--154. Google ScholarDigital Library
Facebook. 2017. RocksDB. Retrieved April 6, 2017 from http://rocksdb.org/.Google Scholar
Guy Golan-Gueta, Edward Bortnikov, Eshcar Hillel, and Idit Keidar. 2015. Scaling concurrent log-structured data stores. In Proceedings of the 10th European Conference on Computer Systems. ACM, 32. Google ScholarDigital Library
Google. 2017. LevelDB. Retrieved April 6, 2017 from http://code.google.com/p/leveldb.Google Scholar
Lei Guo, Dejun Teng, Rubao Lee, Feng Chen, Siyuan Ma, and Xiaodong Zhang. 2016. Re-enabling high-speed caching for LSM-trees. arXiv preprint arXiv:1606.02015 (2016).Google Scholar
H. V. Jagadish, P. P. S. Narayan, Sridhar Seshadri, S. Sudarshan, and Rama Kanneganti. 1997. Incremental organization for data recording and warehousing. In VLDB, Vol. 97. Citeseer, 16--25. Google ScholarDigital Library
Yinan Li, Bingsheng He, Robin Jun Yang, Qiong Luo, and Ke Yi. 2010. Tree indexing on solid state drives. Proc. VLDB Endow. 3, 1-2 (2010), 1195--1206. Google ScholarDigital Library
Hyeontaek Lim, David G. Andersen, and Michael Kaminsky. 2016. Towards accurate and fast evaluation of multi-stage log-structured designs. In Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST’16). USENIX Association, 149--166. Google ScholarDigital Library
Lanyue Lu, Thanumalayan Sankaranarayana Pillai, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2016. WiscKey: Separating keys from values in SSD-conscious storage. In Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST’16). 133--148. Google ScholarDigital Library
Ryan Mcguire. 2014. Compaction Improvements in Cassandra 2.1. April 6, 2014 from http://www.datastax.com/dev/blog/compaction-improvements-in-cassandra-21.Google Scholar
Patrick ONeil, Edward Cheng, Dieter Gawlick, and Elizabeth ONeil. 1996. The log-structured merge-tree (LSM-tree). Acta Inform. 33, 4 (1996), 351--385. Google ScholarDigital Library
Russell Sears and Raghu Ramakrishnan. 2012. bLSM: A general purpose log structured merge tree. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. ACM, 217--228. Google ScholarDigital Library
Pradeep J. Shetty, Richard P. Spillane, Ravikant R. Malpani, Binesh Andrews, Justin Seyster, and Erez Zadok. 2013. Building workload-independent storage with VT-trees. In Presented as part of the Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST’13). 17--30. Google ScholarDigital Library
Dejun Teng, Lei Guo, Rubao Lee, Feng Chen, Siyuan Ma, Yanfeng Zhang, and Xiaodong Zhang. 2017. LSbM-tree: Re-enabling buffer caching in data management for mixed reads and writes. In Proceedings of the 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS’17). IEEE, 68--79.Google ScholarCross Ref
Peng Wang, Guangyu Sun, Song Jiang, Jian Ouyang, Shiding Lin, Chen Zhang, and Jason Cong. 2014. An efficient design and implementation of LSM-tree based key-value store on open-channel SSD. In Proceedings of the 9th European Conference on Computer Systems. ACM, 16. Google ScholarDigital Library
Xingbo Wu, Yuehai Xu, Zili Shao, and Song Jiang. 2015. LSM-trie: An LSM-tree-based ultra-large key-value store for small data. In Proceedings of the 2015 USENIX Conference on Usenix Annual Technical Conference. USENIX Association, 71--82. Google ScholarDigital Library
Kai Zhang, Kaibo Wang, Yuan Yuan, Lei Guo, Rubao Lee, and Xiaodong Zhang. 2015. Mega-kv: A case for GPUS to maximize the throughput of in-memory key-value stores. Proc. VLDB Endow. 8, 11 (2015), 1226--1237. Google ScholarDigital Library

Index Terms

A Low-cost Disk Solution Enabling LSM-tree to Achieve High Performance for Mixed Read/Write Workloads
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Parallel and distributed DBMSs
        Key-value stores
      2. Record and buffer management
2. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        File systems management

Recommendations

FlatLSM: Write-Optimized LSM-Tree for PM-Based KV Stores
The Log-Structured Merge Tree (LSM-Tree) is widely used in key-value (KV) stores because of its excwrite performance. But LSM-Tree-based KV stores still have the overhead of write-ahead log and write stall caused by slow L₀ flush and L₀-L₁ compaction. New ...
Read More
LSM-tree managed storage for large-scale key-value store
SoCC '17: Proceedings of the 2017 Symposium on Cloud Computing

Key-value stores are increasingly adopting LSM-trees as their enabling data structure in the backend storage, and persisting their clustered data through a file system. A file system is expected to not only provide file/directory abstraction to organize ...
Read More
Towards Read-Intensive Key-Value Stores with Tidal Structure Based on LSM-Tree
ASPDAC '20: Proceedings of the 25th Asia and South Pacific Design Automation Conference

Key-value store has played a critical role in many large-scale data storage applications. The log-structured merge-tree (LSM-tree) based key-value store achieves excellent performance on write-intensive workloads which is mainly benefited from the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Storage Volume 14, Issue 2
May 2018
210 pages
ISSN:1553-3077
EISSN:1553-3093
DOI:10.1145/3208078
Editor:
Sam H. Noh
Ulsan National Institute of Science and Technology, Ulsan, Korea
Issue’s Table of Contents
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 April 2018
- Revised: 1 November 2017
- Accepted: 1 November 2017
- Received: 1 April 2017
Published in tos Volume 14, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
LSM-tree
buffer cache
compaction
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 954
  Total Downloads
- Downloads (Last 12 months)239
- Downloads (Last 6 weeks)30
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A Low-cost Disk Solution Enabling LSM-tree to Achieve High Performance for Mixed Read/Write Workloads

ACM Transactions on Storage

Abstract

References

Cited By

Index Terms

Recommendations

FlatLSM: Write-Optimized LSM-Tree for PM-Based KV Stores

LSM-tree managed storage for large-scale key-value store

Towards Read-Intensive Key-Value Stores with Tidal Structure Based on LSM-Tree

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A Low-cost Disk Solution Enabling LSM-tree to Achieve High Performance for Mixed Read/Write Workloads

ACM Transactions on Storage

Abstract

References

Cited By

Index Terms

Recommendations

FlatLSM: Write-Optimized LSM-Tree for PM-Based KV Stores

LSM-tree managed storage for large-scale key-value store

Towards Read-Intensive Key-Value Stores with Tidal Structure Based on LSM-Tree

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media