research-article

Improving Bandwidth Efficiency for Consistent Multistream Storage

Authors:
Andromachi Hatzieleftheriou

University of Ioannina

University of Ioannina
View Profile

,
Stergios V. Anastasiadis

University of Ioannina

University of Ioannina
View Profile

Authors Info & Claims

ACM Transactions on Storage Volume 9 Issue 1Article No.: 2pp 1–27https://doi.org/10.1145/2435204.2435206

Published:01 March 2013Publication History

ACM Transactions on Storage

Abstract

Synchronous small writes play a critical role in system availability because they safely log recent state modifications for fast recovery from crashes. Demanding systems typically dedicate separate devices to logging for adequate performance during normal operation and redundancy during state reconstruction. However, storage stacks enforce page-sized granularity in data transfers from memory to disk. Thus, they consume excessive storage bandwidth to handle small writes, which hurts performance. The problem becomes worse, as filesystems often handle multiple concurrent streams, which effectively generate random I/O traffic. In a journaled filesystem, we introduce wasteless journaling as a mount mode that coalesces synchronous concurrent small writes of data into full page-sized journal blocks. Additionally, we propose selective journaling to automatically activate wasteless journaling on data writes with size below a fixed threshold. We implemented a functional prototype of our design over a widely-used filesystem. Our modes are compared against existing methods using microbenchmarks and application-level workloads on stand-alone servers and a multitier networked system. We examine synchronous and asynchronous writes. Coalescing small data updates to the journal sequentially preserves filesystem consistency while it reduces consumed bandwidth up to several factors, decreases recovery time up to 22%, and lowers write latency up to orders of magnitude.

References

Anand, A., Sen, S., Krioukov, A., Popovici, F. I., Akella, A., Arpaci-Dusseau, A. C., Arpaci-Dusseau, R. H., and Banerjee, S. 2008. Avoiding file system micromanagement with range writes. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation. 161--176. Google ScholarDigital Library
Appuswamy, R., van Moolenbroek, D. C., and Tanenbaum, A. S. 2010. Block-level RAID is dead. In Proceedings of the Workshop on Hot Topics in Storage in File Systems. Google ScholarDigital Library
Baker, J., Bondç, C., Corbett, J., Furman, J. J., Khorlin, A., Larson, J., Léon, J., Li, Y., Lloyd, A., and Yushprakh, V. 2011. Megastore: Providing scalable, highly available storage for interactive services. In Proceedings of the Conference on Innovative Data Systems Research. 223--234.Google Scholar
Batsakis, A., Burns, R. C., Kanevsky, A., Lentini, J., and Talpey, T. 2008. AWOL: An adaptive write optimizations layer. In Proceedings of the USENIX Conference on File and Storage Technologies. 67--80. Google ScholarDigital Library
Bent, J., Gibson, G., Grider, G., McClelland, B., Nowoczynski, P., Nunez, J., Polte, M., and Wingate, M. 2009. PLFS: A checkpoint filesystem for parallel applications. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC). 1--12. Google ScholarDigital Library
Birrell, A. D., Hisgen, A., Jerian, C., Mann, T., and Swart, G. 1993. The Echo distributed file system. Tech. rep. TR-111, DEC Systems Research Center, Palo Alto, CA.Google Scholar
Borthakur, D., Gray, J., Sarma, J. S., Muthukkaruppan, K., Spiegelberg, N., Kuang, H., Ranganathan, K., Molkov, D., Menon, A., Rash, S., Schmidt, R., and Aiyer, A. 2011. Apache Hadoop goes realtime at facebook. In Proceedings of the ACM SIGMOD Conference. 1071--1080. Google ScholarDigital Library
Bovet, D. P. and Cesati, M. 2005. Understanding the Linux Kernel 3rd Ed. O’Reilly Media, Sebastopol, CA. Google ScholarDigital Library
Brito, A., Fetzer, C., and Felber, P. 2009. Minimizing latency in fault-tolerant distributed stream processing systems. In Proceedings of the International Conference on Distributed Computing Systems. 173--182. Google ScholarDigital Library
Calder, B., Wang, J., Ogus, A., Nilakantan, N., and Skjolsvold, A., et al. 2011. Windows Azure Storage: A highly available cloud storage service with strong consistency. In Proceedings of the ACM Symposium on Operating Systems Principles. ACM, New York, 143--157. Google ScholarDigital Library
Carns, P., Lang, S., Ross, R., Vilayannur, M., Kunkel, J., and Ludwig, T. 2009. Small-file access in parallel file systems. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium. IEEE, Washington, D.C., 1--11. Google ScholarDigital Library
Chandrasekaran, S. and Franklin, M. 2004. Remembrance of streams past: Overload-sensitive management of archived streams. In Proceedings of the Conference on Very Large Data Bases. 348--359. Google ScholarDigital Library
Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., Chandra, T., Fikes, A., and Gruber, R. E. 2006. Bigtable: A distributed storage system for structured data. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation. 205--218. Google ScholarDigital Library
Cheetah. 2007. Seagate Cheetah 15K.5 SAS (ST3300655SS). Product Manual. http://www.seagate.com/staticfiles/support/disc/manuals/enterprise/cheetah/15K.5/SAS/100384784e.pdf.Google Scholar
Chen, F., Koufaty, D. A., and Zhang, X. 2009. Understanding intrinsic characteristics and system implications of flash memory based solid state drives. In Proceedings of the Conference on SIGMETRICS/Performance. 181--192. Google ScholarDigital Library
Chen, P. M., Ng, W. T., Chandra, S., Aycock, C., Rajamani, G., and Lowell, D. 1996. The Rio file cache: Surviving operating system crashes. In Proceedings of the ACM International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, New York, 74--83. Google ScholarDigital Library
Chidambaram, V., Sharma, T., Arpaci-Dusseau, A. C., and Arpaci-Dusseau, R. H. 2012. Consistency without ordering. In Proceedings of the USENIX Conference on File and Storage Technologies. 101--116. Google ScholarDigital Library
Choi, H. J., Lim, S.-H., and Park, K. H. 2009. JFTL: A flash translation layer based on a journal remapping for flash memory. ACM Trans. Storage 4, 14:1--14:22. Google ScholarDigital Library
Dai, H., Neufeld, M., and Han, R. 2004. ELF: An efficient log-structured flash file system for micro sensor nodes. In Proceedings of the ACM International Conference on Embedded Networked Sensor Systems. 176--187. Google ScholarDigital Library
DBT. Database test suite. http://osdldbt.sourceforge.net/.Google Scholar
Desnoyers, P. J. and Shenoy, P. 2007. Hyperion: High volume stream archival for retrospective querying. In Proceedings of the USENIX Annual Technical Conference. 45--58. Google ScholarDigital Library
DeWitt, D. J., Katz, R. H., Olken, F., Shapiro, L. D., Stonebraker, M. R., and Wood, D. A. 1984. Implementation techniques for main memory database systems. In Proceedings of the ACM SIGMOD Conference. ACM, New York, 1--8. Google ScholarDigital Library
Elnozahy, E. N. and Plank, J. S. 2004. Checkpointing for peta-scale systems: A look into the future of practical rollback-recovery. IEEE Trans. Dependable Secure Comput. 1, 2, 97--108. Google ScholarDigital Library
Filebench. 2011. http://sourceforge.net/apps/mediawiki/filebench/index.php?title=Main_Page.Google Scholar
Fryer, D., Sun, K., Mahmood, R., Cheng, T., Benjamin, S., Goel, A., and Brown, A. D. 2012. Recon: Verifying file system consistency at runtime. In Proceedings of the USENIX Conference on File and Storage Technologies. 73--86. Google ScholarDigital Library
Gray, J. and Reuter, A. 1993. Transaction Processing: Concepts and Techniques. Morgan Kaufmann, Ch. 9. Google ScholarDigital Library
Grupp, L. M., Davis, J. D., and Swanson, S. 2012. The bleak future of NAND flash memory. In Proceedings of the USENIX Conference on File and Storage Technologies. 17--24. Google ScholarDigital Library
Hagmann, R. 1987. Reimplementing the Cedar file system using logging and group commit. In Proceedings of the ACM Symposium on Operating Systems Principles. ACM, New York, 155--162. Google ScholarDigital Library
Hatzieleftheriou, A. and Anastasiadis, S. V. 2011a. JLFS: Journaling the log-structured filesystem for proactive cleaning in flash storage. In Proceedings of the USENIX Annual Technical Conference (poster).Google Scholar
Hatzieleftheriou, A. and Anastasiadis, S. V. 2011b. Okeanos: Wasteless journaling for fast and reliable multistream storage. In Proceedings of the USENIX Annual Technical Conference. 235--240. Google ScholarDigital Library
Hildebrand, D., Ward, L., and Honeyman, P. 2006. Large files, small writes, and pNFS. In Proceedings of the ACM International Conference on Supercomputing. 116--124. Google ScholarDigital Library
Hildebrand, D., Povzner, A., Tewari, R., and Tarasov, V. 2011. Revisiting the storage stack in virtualized nas environments. In Proceedings of the Workshop on I/O Virtualization (co-held with USENIX ATC). Google ScholarDigital Library
Hisgen, A., Birrell, A., Jerian, C., Mann, T., and Swart, G. 1993. New-value logging in the Echo replicated file system. Tech. rep. SRC 104, Digital Equipment Corp., Palo Alto, CA.Google Scholar
Hitz, D., Lau, J., and Malcolm, M. 1994. File system design for an NFS file server appliance. In Proceedings of the USENIX Winter Technical Conference. 235--246. Google ScholarDigital Library
Hu, Y., Nightingale, T., and Yang, Q. 2002. RAPID-Cache--a reliable and inexpensive write cache for high performance storage systems. IEEE Trans. Parallel Distrib. Syst. 13, 3, 290--307. Google ScholarDigital Library
Huang, T.-C. and Chang, D.-W. 2011. VM aware journaling: Improving journaling file system performance in virtualization environments. Softw. Pract. Exper. 42, 3, 303--330. Google ScholarDigital Library
Itzkovitz, A. and Schuster, A. 1999. MultiView and Millipage - Fine-grain sharing in page-based DSMs. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation. 215--228. Google ScholarDigital Library
Jetstress. 2007. Microsoft exchange server jetstress tool. http://technet.microsoft.com/en-us/library/bb643093.aspx.Google Scholar
Katcher, J. 1997. PostMark: A new file system benchmark. Tech. rep. TR-3022, NetApp.Google Scholar
Kumar, V. A., Cao, M., Santos, J. R., and Dilger, A. 2008. Ext4 block and inode allocator improvements. In Proceedings of the Linux Symposium. 263--274.Google Scholar
Kwon, Y., Balazinska, M., and Greensberg, A. 2008. Fault-tolerant stream processing using a distributed, replicated file system. In Proceedings of the Very Large Data Bases Conference. 574--585. Google ScholarDigital Library
Le, D., Hang, H., and Wang, H. 2012. Understanding performance implications of nested file systems in a virtualized environment. In Proceedings of the USENIX Conference on File and Storage Technologies. 87--100. Google ScholarDigital Library
Leung, A. W., Pasupathy, S., Goodson, G., and Miller, E. L. 2008. Measurement and analysis of large-scale network file system workloads. In Proceedings of the USENIX Annual Technical Conference. 213--226. Google ScholarDigital Library
Mammarella, M., Hovsepian, S., and Kohler, E. 2009. Modular data storage with Anvil. In Proceedings of the ACM Symposium on Operating Systems Principles. ACM, New York, 147--160. Google ScholarDigital Library
Mao, Y., Kohler, E., and Morris, R. 2012. Cache craftiness for fast multicore key-value storage. In Proceedings of the ACM European Conference on Computer Systems. ACM, New York. Google ScholarDigital Library
Mesnier, M., Chen, F., Luo, T., and Akers, J. 2011. Differentiated storage services. In Proceedings of the ACM Symposium on Operating Systems Pinciples. ACM, New York, 57--70. Google ScholarDigital Library
Min, C., Kim, K., Cho, H., Lee, S.-W., and Eom, Y. I. 2012. SFS: Random write considered harmful in solid state drives. In Proceedings of the USENIX Conference on File and Storage Technologies. 139--154. Google ScholarDigital Library
MPI-IO. The Los Alamos National LabMPI-IO Test. http://public.lanl.gov/jnunez/benchmarks/mpiiotest.htm.Google Scholar
Mullins, C. S. 2002. Database Administration: The Complete Guide to Practices and Procedures. Addison Wesley, Ch. 11, 308.Google Scholar
MySQL. http://www.mysql.com/.Google Scholar
Narayanan, D., Thereska, E., Donnelly, A., Elnikety, S., and Rowstron, A. 2009. Migrating server storage to SSDs: Analysis of tradeoffs. In Proceedings of the ACM European Conference on Computer Systems. ACM, New York, 145--158. Google ScholarDigital Library
Nightingale, E. B., Veeraraghavan, K., Chen, P. M., and Flinn, J. 2006. Rethink the sync. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation. 1--14. Google ScholarDigital Library
Oral, S., Wang, F., Dillow, D., Shipman, G., Miller, R., and Drokin, O. 2010. Efficient object storage journaling in a distributed parallel file system. In Proceedings of the USENIX Conference on File and Storage Technologies. 143--154. Google ScholarDigital Library
Ouyang, X., Nellans, D., Wipfel, R., Flynn, D., and Panda, D. K. 2011a. Beyond block I/O: Rethinking traditional storage primitives. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture. IEEE, Los Alamitos, CA, 301--311. Google ScholarDigital Library
Ouyang, X., Rajachandrasekar, R., Besseron, X., Wang, H., Huang, J., and Panda, D. K. 2011b. CRFS: A lightweight user-level filesystem for generic checkpoint/restart. In Proceedings of the International Conference Parallel Processing. 375--384. Google ScholarDigital Library
Polte, M., Simsa, J., Tantisiriroj, W., Gibson, G., Dayal, S., Chainani, M., and Uppugandla, D. K. 2008. Fast log-based concurrent writing of checkpoints. In Proceedings of the Petascale Data Storage Workshop.Google Scholar
Prabhakaran, V., Arpaci-Dusseau, A. C., and Arpaci-Dusseau, R. H. 2005a. Analysis and evolution of journaling file systems. In Proceedings of the USENIX Annual Technical Conference. 105--120. Google ScholarDigital Library
Prabhakaran, V., Bairavasundaram, L. N., Agrawal, N., Gunawi, H. S., Arpaci-Dusseau, A. C., and Arpaci-Dusseau, R. H. 2005b. IRON file systems. In Proceedings of the ACM Symposium on Operating Systems Principles. ACM, New York, 206--220. Google ScholarDigital Library
PVFS2. Parallel virtual file system, version 2. http://www.pvfs.org.Google Scholar
Rajimwale, A., Chidambaram, V., Ramamurthi, D., Arpaci-Dusseau, A., and Arpaci-Dusseau, R. 2011. Coerced cache eviction and discreet-mode journaling: Dealing with misbehaving disks. In Proceedings of the International Conference Dependable Systems and Networks. Google ScholarDigital Library
Rosenblum, M. and Ousterhout, J. K. 1992. The design and implementation of a log-structured file system. ACM Trans. Comput. Syst. 10, 1, 26--52. Google ScholarDigital Library
SATA. 2003. Serial ATA: High speed serialized AT attachment. Revision 1.0a, SerialATA Workgroup.Google Scholar
Satyanarayanan, M., Mashburn, H. H., Kumar, P., Steere, D. C., and Kistler, J. J. 1993. Lightweight recoverable virtual memory. In Proceedings of the ACM SIGOPS. ACM, New York, 146--160. Google ScholarDigital Library
SBC. 2005. Working draft project American National Standard, SCSI Block Commands-3, Technical Committee T10, INCITS. ftp://ftp.t10.org/t10/document.05/05-369r0.pdf.Google Scholar
Schindler, J., Griffin, J. L., Lumb, C. R., and Ganger, G. R. 2002. Track-aligned extents: Matching access patterns to disk drive characteristics. In Proceedings of the USENIX Conference on File and Storage Technologies. 259--274. Google ScholarDigital Library
Sears, R. and Brewer, E. 2006. Stasis: Flexible transactional storage. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation. 29--44. Google ScholarDigital Library
Seltzer, M., Smith, K. A., Balakrishnan, H., Chang, J., McMains, S., and Padmanabhan, V. 1995. File system logging versus clustering: A performance comparison. In Proceedings of the USENIX Annual Technical Conference. 21--21. Google ScholarDigital Library
Seltzer, M. I., Ganger, G. R., McKusick, M. K., Smith, K. A., Soules, C. A. N., and Stein, C. A. 2000. Journaling versus soft updates: Asynchronous meta-data protection in file systems. In Proceedings of the USENIX Annual Technical Conference. 71--84. Google ScholarDigital Library
Shin, D. I., Yu, Y. J., Kim, H. S., Eom, H., and Yeom, H. Y. 2011. Request bridging and interleaving: Improving the performance of small synchronous updates under seek-optimizing disk subsystems. ACM Trans. Storage 7, 2, 4:1--4:31. Google ScholarDigital Library
Thakur, R., Gropp, W., and Lusk, E. 1999. Data sieving and collective I/O in ROMIO. In Proceedings of the IEEE Symposium Frontiers of Massively Parallel Computation. 182--189. Google ScholarDigital Library
TPCC. 1992. TPC benchmark C standard specification. Tech. rep., Transaction Processing Council.Google Scholar
Tweedie, S. C. 1998. Journaling the Linux ext2fs filesystem. In LinuxExpo. 25--29.Google Scholar
Verissimo, P. and Rodrigues, L. 2001. Distributed Systems for System Architects. Kluwer Academic, Norwell, MA. Google ScholarDigital Library
Wang, R. Y., Anderson, T. E., and Patterson, D. A. 1999. Virtual log based file systems for a programmable disk. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation. 29--43. Google ScholarDigital Library
Weil, S. A., Brandt, S. A., Miller, E. L., Long, D. D. E., and Maltzahn, C. 2006. Ceph: A scalable, high-performance distributed file system. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation. 307--320. http://ceph.newdream.net/wiki/OSD_journal. Google ScholarDigital Library
Woodhouse, D. 2001. JFFS: The journaling flash file system. In Proceedings of the Linux Symposium.Google Scholar
Yoshiji, A., Konishi, R., Sato, K., Hifumi, H., Tamura, Y., Kihara, S., and Moriai, S. 2009. NILFS: Continuous snapshotting filesystem for Linux. NTT Corp. http://www.nilfs.org/en/.Google Scholar
Zhang, Z. and Ghose, K. 2007. hFS: A hybrid file system prototype for improving small file and metadata performance. In Proceedings of the ACM European Conference on Computer Systems. ACM, New York, 175--187. Google ScholarDigital Library

Index Terms

Improving Bandwidth Efficiency for Consistent Multistream Storage
1. General and reference
  1. Cross-computing tools and techniques
    1. Performance
    2. Reliability
2. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        File systems management
    2. Extra-functional properties
      1. Software performance
      2. Software reliability

Recommendations

Design and Implementation of a Journaling File System for Phase-Change Memory
Journaling file systems are widely used in modern computer systems as they provide high reliability at reasonable cost. However, existing journaling file systems are not efficient for emerging PCM (phase-change memory) storage because they are optimized ...
Read More
WOJ: Enabling Write-Once Full-data Journaling in SSDs by Using Weak-Hashing-based Deduplication

Journaling is a commonly used technique to ensure data consistency in file systems, such as ext3 and ext4. With journaling technique, file system updates are first recorded in a journal (in the commit phase) and later applied to their home locations in ...
Read More
The design and implementation of a log-structured file system

This paper presents a new technique for disk storage management called a log-structured file system. A log-structured file system writes all modifications to disk sequentially in a log-like structure, thereby speeding up both file writing and crash ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Storage Volume 9, Issue 1
March 2013
84 pages
ISSN:1553-3077
EISSN:1553-3093
DOI:10.1145/2435204
Issue’s Table of Contents

Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 March 2013
- Accepted: 1 October 2012
- Revised: 1 July 2012
- Received: 1 April 2012
Published in tos Volume 9, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Journaling
concurrency
logging
small writes
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 432
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Improving Bandwidth Efficiency for Consistent Multistream Storage

ACM Transactions on Storage

Abstract

References

Cited By

Index Terms

Recommendations

Design and Implementation of a Journaling File System for Phase-Change Memory

WOJ: Enabling Write-Once Full-data Journaling in SSDs by Using Weak-Hashing-based Deduplication

The design and implementation of a log-structured file system

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Improving Bandwidth Efficiency for Consistent Multistream Storage

ACM Transactions on Storage

Abstract

References

Cited By

Index Terms

Recommendations

Design and Implementation of a Journaling File System for Phase-Change Memory

WOJ: Enabling Write-Once Full-data Journaling in SSDs by Using Weak-Hashing-based Deduplication

The design and implementation of a log-structured file system

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media