ABSTRACT
DFSCQ is the first file system that (1) provides a precise specification for fsync and fdatasync, which allow applications to achieve high performance and crash safety, and (2) provides a machine-checked proof that its implementation meets this specification. DFSCQ's specification captures the behavior of sophisticated optimizations, including log-bypass writes, and DFSCQ's proof rules out some of the common bugs in file-system implementations despite the complex optimizations.
The key challenge in building DFSCQ is to write a specification for the file system and its internal implementation without exposing internal file-system details. DFSCQ introduces a metadata-prefix specification that captures the properties of fsync and fdatasync, which roughly follows the behavior of Linux ext4. This specification uses a notion of tree sequences---logical sequences of file-system tree states---for succinct description of the possible states after a crash and to describe how data writes can be reordered with respect to metadata updates. This helps application developers prove the crash safety of their own applications, avoiding application-level bugs such as forgetting to invoke fsync on both the file and the containing directory.
An evaluation shows that DFSCQ achieves 103 MB/s on large file writes to an SSD and durably creates small files at a rate of 1,618 files per second. This is slower than Linux ext4 (which achieves 295 MB/s for large file writes and 4,977 files/s for small file creation) but much faster than two recent verified file systems, Yggdrasil and FSCQ. Evaluation results from application-level benchmarks, including TPC-C on SQLite, mirror these microbenchmarks.
Supplemental Material
- S. Amani and T. Murray. Specifying a realistic file system. In Proceedings of the Workshop on Models for Formal Analysis of Real Systems, pages 1--9, Suva, Fiji, Nov. 2015.Google ScholarCross Ref
- S. Amani, A. Hixon, Z. Chen, C. Rizkallah, P. Chubb, L. O'Connor, J. Beeren, Y. Nagashima, J. Lim, T. Sewell, J. Tuong, G. Keller, T. Murray, G. Klein, and G. Heiser. Cogent: Verifying high-assurance file system implementations. In Proceedings of the 21st International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 175--188, Atlanta, GA, Apr. 2016. Google ScholarDigital Library
- K. Arkoudas, K. Zee, V. Kuncak, and M. Rinard. Verifying a file system implementation. In Proceedings of the 6th International Conference on Formal Engineering Methods, Seattle, WA, Nov. 2004.Google ScholarCross Ref
- G. Barthe, C. Fournet, B. Grégoire, P.-Y. Strub, N. Swamy, and S. Zanella-Béguelin. Probabilistic relational verification for cryptographic implementations. In Proceedings of the 41st ACM Symposium on Principles of Programming Languages (POPL), San Diego, CA, Jan. 2014. Google ScholarDigital Library
- W. R. Bevier and R. M. Cohen. An executable model of the Synergy file system. Technical Report 121, Computational Logic, Inc., Oct. 1996.Google Scholar
- W. R. Bevier, R. M. Cohen, and J. Turner. A specification for the Synergy file system. Technical Report 120, Computational Logic, Inc., Sept. 1995.Google Scholar
- S. S. Bhat, R. Eqbal, A. T. Clements, M. F. Kaashoek, and N. Zeldovich. Scaling a file system to many cores using an operation log. In Proceedings of the 26th ACM Symposium on Operating Systems Principles (SOSP), Shanghai, China, Oct. 2017. Google ScholarDigital Library
- J. Bornholt, A. Kaufmann, J. Li, A. Krishnamurthy, E. Torlak, and X. Wang. Specifying and checking file system crash-consistency models. In Proceedings of the 21st International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 83--98, Atlanta, GA, Apr. 2016. Google ScholarDigital Library
- N. Brown. Overlay filesystem. https://www.kernel.org/doc/Documentation/filesystems/overlayfs.txt.Google Scholar
- H. Chen, D. Ziegler, T. Chajed, A. Chlipala, M. F. Kaashoek, and N. Zeldovich. Using Crash Hoare Logic for certifying the FSCQ file system. In Proceedings of the 25th ACM Symposium on Operating Systems Principles (SOSP), pages 18--37, Monterey, CA, Oct. 2015. Google ScholarDigital Library
- V. Chidambaram, T. S. Pillai, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. Optimistic crash consistency. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP), pages 228--243, Farmington, PA, Nov. 2013. Google ScholarDigital Library
- A. T. Clements, M. F. Kaashoek, N. Zeldovich, R. T. Morris, and E. Kohler. The scalable commutativity rule: Designing scalable software for multicore processors. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP), pages 1--17, Farmington, PA, Nov. 2013. Google ScholarDigital Library
- Coq development team. The Coq Proof Assistant Reference Manual, Version 8.5pl2. INRIA, July 2016. http://coq.inria.fr/distrib/current/refman/.Google Scholar
- J. Corbet. ext4 and data loss. http://lwn.net/Articles/322823/, Mar. 2009.Google Scholar
- R. Cox, M. F. Kaashoek, and R. T. Morris. Xv6, a simple Unix-like teaching operating system, 2016. http://pdos.csail.mit.edu/6.828/xv6.Google Scholar
- L. Czerner. {PATCH} ext4: Fix data corruption caused by unwritten and delayed extents. https://lwn.net/Articles/645722, Apr. 2015.Google Scholar
- G. Ernst, G. Schellhorn, D. Haneberg, J. Pfähler, and W. Reif. Verification of a virtual filesystem switch. In Proceedings of the 5th Working Conference on Verified Software: Theories, Tools and Experiments, Menlo Park, CA, May 2013. Google ScholarDigital Library
- M. A. Ferreira and J. N. Oliveira. An integrated formal methods tool-chain and its application to verifying a file system model. In Proceedings of the 12th Brazilian Symposium on Formal Methods, Aug. 2009. Google ScholarDigital Library
- L. Freitas, J. Woodcock, and A. Butterfield. POSIX and the verification grand challenge: A roadmap. In Proceedings of 13th IEEE International Conference on Engineering of Complex Computer Systems, pages 153--162, Mar.-Apr. 2008. Google ScholarDigital Library
- G. R. Ganger and Y. N. Patt. Metadata update performance in file systems. In Proceedings of the 1st Symposium on Operating Systems Design and Implementation (OSDI), pages 49--60, Monterey, CA, Nov. 1994. Google ScholarDigital Library
- P. Gardner, G. Ntzik, and A. Wright. Local reasoning for the POSIX file system. In Proceedings of the 23rd European Symposium on Programming, pages 169--188, Grenoble, France, 2014. Google ScholarDigital Library
- B. Gribincea et al. Ext4 data loss. https://bugs.launchpad.net/ubuntu/+source/linux/+bug/317781, Jan. 2009.Google Scholar
- M. P. Herlihy and J. M. Wing. Linearizability: a correctness condition for concurrent objects. ACM Transactions on Programming Languages Systems, 12(3):463--492, 1990. Google ScholarDigital Library
- W. H. Hesselink and M. Lali. Formalizing a hierarchical file system. In Proceedings of the 14th BCS-FACS Refinement Workshop, pages 67--85, Dec. 2009. Google ScholarDigital Library
- B. Hutchings. {PATCH 3.2 027/115} jbd2: fix fs corruption possibility in jbd2_journal_destroy() on umount path. https://lkml.org/lkml/2016/4/26/1230, April 2016.Google Scholar
- IEEE (The Institute of Electrical and Electronics Engineers) and The Open Group. The Open Group base specifications issue 7, 2013 edition (POSIX.1-2008/Cor 1-2013), Apr. 2013.Google Scholar
- D. Jones. Trinity: A Linux system call fuzz tester, 2014. http://codemonkey.org.uk/projects/trinity/.Google Scholar
- R. Joshi and G. J. Holzmann. A mini challenge: Build a verifiable filesystem. Formal Aspects of Computing, 19 (2):269--272, June 2007. Google ScholarCross Ref
- E. Kang and D. Jackson. Formal modeling and analysis of a Flash filesystem in Alloy. In Proceedings of the 1st Int'l Conference of Abstract State Machines, B and Z, pages 294--308, London, UK, Sept. 2008. Google ScholarDigital Library
- J. Kara. {PATCH} ext4: Forbid journal_async_commit in data=ordered mode. http://permalink.gmane.org/gmane.comp.file-systems.ext4/46977, Nov. 2014.Google Scholar
- J. Kara. ext4: fix crashes in dioread_nolock mode. http://permalink.gmane.org/gmane.linux.kernel.commits.head/575311, Feb. 2016.Google Scholar
- E. Koskinen and J. Yang. Reducing crash recoverability to reachability. In Proceedings of the 43rd ACM Symposium on Principles of Programming Languages (POPL), pages 97--108, St. Petersburg, FL, Jan. 2016. Google ScholarDigital Library
- Linux Kernel Developers. Ext4 filesystem, 2017. https://www.kernel.org/doc/Documentation/filesystems/ext4.txt.Google Scholar
- L. Lu, A. C. Arpaci-Dusseau, R. H. Arpaci-Dusseau, and S. Lu. A study of Linux file system evolution. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST), pages 31--44, San Jose, CA, Feb. 2013. Google ScholarDigital Library
- J. Mickens, E. Nightingale, J. Elson, B. Fan, A. Kadav, V. Chidambaram, O. Khan, K. Nareddy, and D. Gehring. Blizzard: Fast, cloud-scale block storage for cloud-oblivious applications. In Proceedings of the 11th Symposium on Networked Systems Design and Implementation (NSDI), Seattle, WA, Apr. 2014. Google ScholarDigital Library
- K. Mostafa. {PATCH 3.13 075/103} jbd2: fix descriptor block size handling errors with journal_csum. https://lkml.org/lkml/2014/9/30/747, Sept. 2014.Google Scholar
- K. Mostafa. ext4: fix null pointer dereference when journal restart fails. https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=9d506594069355d1fb2de3f9104667312ff08ed3, June 2016.Google Scholar
- D. Park and D. Shin. iJournaling: Fine-grained journaling for improving the latency of fsync system call. In Proceedings of the 2017 USENIX Annual Technical Conference, pages 787--798, Santa Clara, CA, July 2017. Google ScholarDigital Library
- A. Pavlo. Python implementation of TPC-C, 2017. https://github.com/apavlo/py-tpcc.Google Scholar
- T. S. Pillai, V. Chidambaram, R. Alagappan, S. Al-Kiswany, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. All file systems are not created equal: On the complexity of crafting crash-consistent applications. In Proceedings of the 11th Symposium on Operating Systems Design and Implementation (OSDI), pages 433--448, Broomfield, CO, Oct. 2014. Google ScholarDigital Library
- T. S. Pillai, R. Alagappana, L. Lu, V. Chidambaram, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. Application crash consistency and performance with CCFS. In Proceedings of the 15th USENIX Conference on File and Storage Technologies (FAST), pages 181--196, Santa Clara, CA, Feb.-Mar. 2017. Google ScholarDigital Library
- J. C. Reynolds. Separation logic: A logic for shared mutable data structures. In Proceedings of the 17th Annual IEEE Symposium on Logic in Computer Science, pages 55--74, Copenhagen, Denmark, July 2002. Google ScholarDigital Library
- T. Ridge, D. Sheets, T. Tuerk, A. Giugliano, A. Madhavapeddy, and P. Sewell. SibylFS: formal specification and oracle-based testing for POSIX and real-world file systems. In Proceedings of the 25th ACM Symposium on Operating Systems Principles (SOSP), pages 38--53, Monterey, CA, Oct. 2015. Google ScholarDigital Library
- X. Roche, G. Clare, K. Schwarz, P. Eggert, J. Schilling, A. Josey, and J. Pugsley. Necessary step(s) to synchronize filename operations on disk. Austin Group Defect Tracker, Mar. 2013. http://austingroupbugs.net/view.php?id=672.Google Scholar
- M. Rosenblum and J. Ousterhout. The design and implementation of a log-structured file system. In Proceedings of the 13th ACM Symposium on Operating Systems Principles (SOSP), pages 1--15, Pacific Grove, CA, Oct. 1991. Google ScholarDigital Library
- E. Sandeen. {PATCH} ext4: fix unjournaled inode bitmap modification. http://permalink.gmane.org/gmane.comp.file-systems.ext4/35119, Oct. 2012.Google Scholar
- H. Sigurbjarnarson and X. Wang. Personal communication, Mar. 2017.Google Scholar
- H. Sigurbjarnarson, J. Bornholt, E. Torlak, and X. Wang. Push-button verification of file systems via crash refinement. In Proceedings of the 12th Symposium on Operating Systems Design and Implementation (OSDI), pages 1--16, Savannah, GA, Nov. 2016. Google ScholarDigital Library
- H. Sigurbjarnarson, J. Bornholt, E. Torlak, and X. Wang. The Yggdrasil toolkit, 2017. https://github.com/locore/yggdrasil/.Google Scholar
- M. Szeredi. ovl: fsync after copy-up. https://github.com/torvalds/linux/commit/641089c1549d8d3df0b047b5de7e9a111362cdce, Oct. 2016.Google Scholar
- The Linux man-pages project. fdatasync(2): Linux man page, 2017. http://man7.org/linux/man-pages/man2/fdatasync.2.html.Google Scholar
- The Open Group. fdatasync: synchronize the data of a file, 2016. http://pubs.opengroup.org/onlinepubs/9699919799/functions/fdatasync.html.Google Scholar
- T. Ts'o. {PATCH} ext4, jbd2: add REQ_FUA flag when recording an error flag. http://permalink.gmane.org/gmane.comp.file-systems.ext4/49323, July 2015.Google Scholar
- T. Ts'o. {PATCH} ext4: use private version of page_zero_new_buffers() for data=journal mode. https://lkml.org/lkml/2015/10/9/1, Oct. 2015.Google Scholar
- T. Ts'o. ext4: fix race between truncate and __ext4_journalled_writepage(). https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=bdf96838aea6a265f2ae6cbcfb12a778c84a0b8e, June 2015.Google Scholar
- S. C. Tweedie. Journaling the Linux ext2fs filesystem. In Proceedings of the 4th Annual LinuxExpo, Durham, NC, May 1998.Google Scholar
- S. Wang. Certifying checksum-based logging in the RapidFSCQ crash-safe filesystem. Master's thesis, Massachusetts Institute of Technology, June 2016.Google Scholar
- M. Wenzel. Some aspects of Unix file-system security, Aug. 2014. http://isabelle.in.tum.de/library/HOL/HOL-Unix/Unix.html.Google Scholar
- D. J. Wong. jbd2: Fix endian mixing problems in the checksumming code. http://lists.openwall.net/linux-ext4/2013/07/17/1, July 2013.Google Scholar
- D. J. Wong. {PATCH} ext4: fix same-dir rename when inline data directory overflows. http://permalink.gmane.org/gmane.comp.file-systems.ext4/45594, Aug. 2014.Google Scholar
- J. Xu and S. Swanson. NOVA: A log-structured file system for hybrid volatile/non-volatile main memories. In Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST), pages 323--338, Santa Clara, CA, Feb. 2016. Google ScholarDigital Library
- J. Yang, P. Twohey, D. Engler, and M. Musuvathi. Using model checking to find serious file system errors. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI), pages 273--287, San Francisco, CA, Dec. 2004. Google ScholarDigital Library
- J. Yang, C. Sar, P. Twohey, C. Cadar, and D. Engler. Automatically generating malicious disks using symbolic execution. In Proceedings of the 27th IEEE Symposium on Security and Privacy, pages 243--257, Oakland, CA, May 2006. Google ScholarDigital Library
- J. Yang, P. Twohey, D. Engler, and M. Musuvathi. eXplode: A lightweight, general system for finding serious storage system errors. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI), pages 131--146, Seattle, WA, Nov. 2006. Google ScholarDigital Library
- M. Zheng, J. Tucek, D. Huang, F. Qin, M. Lillibridge, E. S. Yang, B. W. Zhao, and S. Singh. Torturing databases for fun and profit. In Proceedings of the 11th Symposium on Operating Systems Design and Implementation (OSDI), pages 449--464, Broomfield, CO, Oct. 2014. Google ScholarDigital Library
Verifying a high-performance crash-safe file system using a tree specification
Recommendations
A high performance multi-structured file system design
SOSP '91: Proceedings of the thirteenth ACM symposium on Operating systems principlesFile system I/O is increasingly becoming a performance bottleneck in large distributed computer systems. This is due to the increased file I/O demands of new applications, the inability of any single storage structure to respond to these demands, and ...
Comments