skip to main content
10.1145/3132747.3132776acmconferencesArticle/Chapter ViewAbstractPublication PagessospConference Proceedingsconference-collections
research-article
Open Access

Verifying a high-performance crash-safe file system using a tree specification

Published:14 October 2017Publication History

ABSTRACT

DFSCQ is the first file system that (1) provides a precise specification for fsync and fdatasync, which allow applications to achieve high performance and crash safety, and (2) provides a machine-checked proof that its implementation meets this specification. DFSCQ's specification captures the behavior of sophisticated optimizations, including log-bypass writes, and DFSCQ's proof rules out some of the common bugs in file-system implementations despite the complex optimizations.

The key challenge in building DFSCQ is to write a specification for the file system and its internal implementation without exposing internal file-system details. DFSCQ introduces a metadata-prefix specification that captures the properties of fsync and fdatasync, which roughly follows the behavior of Linux ext4. This specification uses a notion of tree sequences---logical sequences of file-system tree states---for succinct description of the possible states after a crash and to describe how data writes can be reordered with respect to metadata updates. This helps application developers prove the crash safety of their own applications, avoiding application-level bugs such as forgetting to invoke fsync on both the file and the containing directory.

An evaluation shows that DFSCQ achieves 103 MB/s on large file writes to an SSD and durably creates small files at a rate of 1,618 files per second. This is slower than Linux ext4 (which achieves 295 MB/s for large file writes and 4,977 files/s for small file creation) but much faster than two recent verified file systems, Yggdrasil and FSCQ. Evaluation results from application-level benchmarks, including TPC-C on SQLite, mirror these microbenchmarks.

Skip Supplemental Material Section

Supplemental Material

verifying_filesystem.mp4

References

  1. S. Amani and T. Murray. Specifying a realistic file system. In Proceedings of the Workshop on Models for Formal Analysis of Real Systems, pages 1--9, Suva, Fiji, Nov. 2015.Google ScholarGoogle ScholarCross RefCross Ref
  2. S. Amani, A. Hixon, Z. Chen, C. Rizkallah, P. Chubb, L. O'Connor, J. Beeren, Y. Nagashima, J. Lim, T. Sewell, J. Tuong, G. Keller, T. Murray, G. Klein, and G. Heiser. Cogent: Verifying high-assurance file system implementations. In Proceedings of the 21st International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 175--188, Atlanta, GA, Apr. 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. K. Arkoudas, K. Zee, V. Kuncak, and M. Rinard. Verifying a file system implementation. In Proceedings of the 6th International Conference on Formal Engineering Methods, Seattle, WA, Nov. 2004.Google ScholarGoogle ScholarCross RefCross Ref
  4. G. Barthe, C. Fournet, B. Grégoire, P.-Y. Strub, N. Swamy, and S. Zanella-Béguelin. Probabilistic relational verification for cryptographic implementations. In Proceedings of the 41st ACM Symposium on Principles of Programming Languages (POPL), San Diego, CA, Jan. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. W. R. Bevier and R. M. Cohen. An executable model of the Synergy file system. Technical Report 121, Computational Logic, Inc., Oct. 1996.Google ScholarGoogle Scholar
  6. W. R. Bevier, R. M. Cohen, and J. Turner. A specification for the Synergy file system. Technical Report 120, Computational Logic, Inc., Sept. 1995.Google ScholarGoogle Scholar
  7. S. S. Bhat, R. Eqbal, A. T. Clements, M. F. Kaashoek, and N. Zeldovich. Scaling a file system to many cores using an operation log. In Proceedings of the 26th ACM Symposium on Operating Systems Principles (SOSP), Shanghai, China, Oct. 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Bornholt, A. Kaufmann, J. Li, A. Krishnamurthy, E. Torlak, and X. Wang. Specifying and checking file system crash-consistency models. In Proceedings of the 21st International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 83--98, Atlanta, GA, Apr. 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. N. Brown. Overlay filesystem. https://www.kernel.org/doc/Documentation/filesystems/overlayfs.txt.Google ScholarGoogle Scholar
  10. H. Chen, D. Ziegler, T. Chajed, A. Chlipala, M. F. Kaashoek, and N. Zeldovich. Using Crash Hoare Logic for certifying the FSCQ file system. In Proceedings of the 25th ACM Symposium on Operating Systems Principles (SOSP), pages 18--37, Monterey, CA, Oct. 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. V. Chidambaram, T. S. Pillai, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. Optimistic crash consistency. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP), pages 228--243, Farmington, PA, Nov. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. T. Clements, M. F. Kaashoek, N. Zeldovich, R. T. Morris, and E. Kohler. The scalable commutativity rule: Designing scalable software for multicore processors. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP), pages 1--17, Farmington, PA, Nov. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Coq development team. The Coq Proof Assistant Reference Manual, Version 8.5pl2. INRIA, July 2016. http://coq.inria.fr/distrib/current/refman/.Google ScholarGoogle Scholar
  14. J. Corbet. ext4 and data loss. http://lwn.net/Articles/322823/, Mar. 2009.Google ScholarGoogle Scholar
  15. R. Cox, M. F. Kaashoek, and R. T. Morris. Xv6, a simple Unix-like teaching operating system, 2016. http://pdos.csail.mit.edu/6.828/xv6.Google ScholarGoogle Scholar
  16. L. Czerner. {PATCH} ext4: Fix data corruption caused by unwritten and delayed extents. https://lwn.net/Articles/645722, Apr. 2015.Google ScholarGoogle Scholar
  17. G. Ernst, G. Schellhorn, D. Haneberg, J. Pfähler, and W. Reif. Verification of a virtual filesystem switch. In Proceedings of the 5th Working Conference on Verified Software: Theories, Tools and Experiments, Menlo Park, CA, May 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. A. Ferreira and J. N. Oliveira. An integrated formal methods tool-chain and its application to verifying a file system model. In Proceedings of the 12th Brazilian Symposium on Formal Methods, Aug. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. L. Freitas, J. Woodcock, and A. Butterfield. POSIX and the verification grand challenge: A roadmap. In Proceedings of 13th IEEE International Conference on Engineering of Complex Computer Systems, pages 153--162, Mar.-Apr. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. G. R. Ganger and Y. N. Patt. Metadata update performance in file systems. In Proceedings of the 1st Symposium on Operating Systems Design and Implementation (OSDI), pages 49--60, Monterey, CA, Nov. 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. P. Gardner, G. Ntzik, and A. Wright. Local reasoning for the POSIX file system. In Proceedings of the 23rd European Symposium on Programming, pages 169--188, Grenoble, France, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. B. Gribincea et al. Ext4 data loss. https://bugs.launchpad.net/ubuntu/+source/linux/+bug/317781, Jan. 2009.Google ScholarGoogle Scholar
  23. M. P. Herlihy and J. M. Wing. Linearizability: a correctness condition for concurrent objects. ACM Transactions on Programming Languages Systems, 12(3):463--492, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. W. H. Hesselink and M. Lali. Formalizing a hierarchical file system. In Proceedings of the 14th BCS-FACS Refinement Workshop, pages 67--85, Dec. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. B. Hutchings. {PATCH 3.2 027/115} jbd2: fix fs corruption possibility in jbd2_journal_destroy() on umount path. https://lkml.org/lkml/2016/4/26/1230, April 2016.Google ScholarGoogle Scholar
  26. IEEE (The Institute of Electrical and Electronics Engineers) and The Open Group. The Open Group base specifications issue 7, 2013 edition (POSIX.1-2008/Cor 1-2013), Apr. 2013.Google ScholarGoogle Scholar
  27. D. Jones. Trinity: A Linux system call fuzz tester, 2014. http://codemonkey.org.uk/projects/trinity/.Google ScholarGoogle Scholar
  28. R. Joshi and G. J. Holzmann. A mini challenge: Build a verifiable filesystem. Formal Aspects of Computing, 19 (2):269--272, June 2007. Google ScholarGoogle ScholarCross RefCross Ref
  29. E. Kang and D. Jackson. Formal modeling and analysis of a Flash filesystem in Alloy. In Proceedings of the 1st Int'l Conference of Abstract State Machines, B and Z, pages 294--308, London, UK, Sept. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J. Kara. {PATCH} ext4: Forbid journal_async_commit in data=ordered mode. http://permalink.gmane.org/gmane.comp.file-systems.ext4/46977, Nov. 2014.Google ScholarGoogle Scholar
  31. J. Kara. ext4: fix crashes in dioread_nolock mode. http://permalink.gmane.org/gmane.linux.kernel.commits.head/575311, Feb. 2016.Google ScholarGoogle Scholar
  32. E. Koskinen and J. Yang. Reducing crash recoverability to reachability. In Proceedings of the 43rd ACM Symposium on Principles of Programming Languages (POPL), pages 97--108, St. Petersburg, FL, Jan. 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Linux Kernel Developers. Ext4 filesystem, 2017. https://www.kernel.org/doc/Documentation/filesystems/ext4.txt.Google ScholarGoogle Scholar
  34. L. Lu, A. C. Arpaci-Dusseau, R. H. Arpaci-Dusseau, and S. Lu. A study of Linux file system evolution. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST), pages 31--44, San Jose, CA, Feb. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. J. Mickens, E. Nightingale, J. Elson, B. Fan, A. Kadav, V. Chidambaram, O. Khan, K. Nareddy, and D. Gehring. Blizzard: Fast, cloud-scale block storage for cloud-oblivious applications. In Proceedings of the 11th Symposium on Networked Systems Design and Implementation (NSDI), Seattle, WA, Apr. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. K. Mostafa. {PATCH 3.13 075/103} jbd2: fix descriptor block size handling errors with journal_csum. https://lkml.org/lkml/2014/9/30/747, Sept. 2014.Google ScholarGoogle Scholar
  37. K. Mostafa. ext4: fix null pointer dereference when journal restart fails. https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=9d506594069355d1fb2de3f9104667312ff08ed3, June 2016.Google ScholarGoogle Scholar
  38. D. Park and D. Shin. iJournaling: Fine-grained journaling for improving the latency of fsync system call. In Proceedings of the 2017 USENIX Annual Technical Conference, pages 787--798, Santa Clara, CA, July 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. A. Pavlo. Python implementation of TPC-C, 2017. https://github.com/apavlo/py-tpcc.Google ScholarGoogle Scholar
  40. T. S. Pillai, V. Chidambaram, R. Alagappan, S. Al-Kiswany, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. All file systems are not created equal: On the complexity of crafting crash-consistent applications. In Proceedings of the 11th Symposium on Operating Systems Design and Implementation (OSDI), pages 433--448, Broomfield, CO, Oct. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. T. S. Pillai, R. Alagappana, L. Lu, V. Chidambaram, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. Application crash consistency and performance with CCFS. In Proceedings of the 15th USENIX Conference on File and Storage Technologies (FAST), pages 181--196, Santa Clara, CA, Feb.-Mar. 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. J. C. Reynolds. Separation logic: A logic for shared mutable data structures. In Proceedings of the 17th Annual IEEE Symposium on Logic in Computer Science, pages 55--74, Copenhagen, Denmark, July 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. T. Ridge, D. Sheets, T. Tuerk, A. Giugliano, A. Madhavapeddy, and P. Sewell. SibylFS: formal specification and oracle-based testing for POSIX and real-world file systems. In Proceedings of the 25th ACM Symposium on Operating Systems Principles (SOSP), pages 38--53, Monterey, CA, Oct. 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. X. Roche, G. Clare, K. Schwarz, P. Eggert, J. Schilling, A. Josey, and J. Pugsley. Necessary step(s) to synchronize filename operations on disk. Austin Group Defect Tracker, Mar. 2013. http://austingroupbugs.net/view.php?id=672.Google ScholarGoogle Scholar
  45. M. Rosenblum and J. Ousterhout. The design and implementation of a log-structured file system. In Proceedings of the 13th ACM Symposium on Operating Systems Principles (SOSP), pages 1--15, Pacific Grove, CA, Oct. 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. E. Sandeen. {PATCH} ext4: fix unjournaled inode bitmap modification. http://permalink.gmane.org/gmane.comp.file-systems.ext4/35119, Oct. 2012.Google ScholarGoogle Scholar
  47. H. Sigurbjarnarson and X. Wang. Personal communication, Mar. 2017.Google ScholarGoogle Scholar
  48. H. Sigurbjarnarson, J. Bornholt, E. Torlak, and X. Wang. Push-button verification of file systems via crash refinement. In Proceedings of the 12th Symposium on Operating Systems Design and Implementation (OSDI), pages 1--16, Savannah, GA, Nov. 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. H. Sigurbjarnarson, J. Bornholt, E. Torlak, and X. Wang. The Yggdrasil toolkit, 2017. https://github.com/locore/yggdrasil/.Google ScholarGoogle Scholar
  50. M. Szeredi. ovl: fsync after copy-up. https://github.com/torvalds/linux/commit/641089c1549d8d3df0b047b5de7e9a111362cdce, Oct. 2016.Google ScholarGoogle Scholar
  51. The Linux man-pages project. fdatasync(2): Linux man page, 2017. http://man7.org/linux/man-pages/man2/fdatasync.2.html.Google ScholarGoogle Scholar
  52. The Open Group. fdatasync: synchronize the data of a file, 2016. http://pubs.opengroup.org/onlinepubs/9699919799/functions/fdatasync.html.Google ScholarGoogle Scholar
  53. T. Ts'o. {PATCH} ext4, jbd2: add REQ_FUA flag when recording an error flag. http://permalink.gmane.org/gmane.comp.file-systems.ext4/49323, July 2015.Google ScholarGoogle Scholar
  54. T. Ts'o. {PATCH} ext4: use private version of page_zero_new_buffers() for data=journal mode. https://lkml.org/lkml/2015/10/9/1, Oct. 2015.Google ScholarGoogle Scholar
  55. T. Ts'o. ext4: fix race between truncate and __ext4_journalled_writepage(). https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=bdf96838aea6a265f2ae6cbcfb12a778c84a0b8e, June 2015.Google ScholarGoogle Scholar
  56. S. C. Tweedie. Journaling the Linux ext2fs filesystem. In Proceedings of the 4th Annual LinuxExpo, Durham, NC, May 1998.Google ScholarGoogle Scholar
  57. S. Wang. Certifying checksum-based logging in the RapidFSCQ crash-safe filesystem. Master's thesis, Massachusetts Institute of Technology, June 2016.Google ScholarGoogle Scholar
  58. M. Wenzel. Some aspects of Unix file-system security, Aug. 2014. http://isabelle.in.tum.de/library/HOL/HOL-Unix/Unix.html.Google ScholarGoogle Scholar
  59. D. J. Wong. jbd2: Fix endian mixing problems in the checksumming code. http://lists.openwall.net/linux-ext4/2013/07/17/1, July 2013.Google ScholarGoogle Scholar
  60. D. J. Wong. {PATCH} ext4: fix same-dir rename when inline data directory overflows. http://permalink.gmane.org/gmane.comp.file-systems.ext4/45594, Aug. 2014.Google ScholarGoogle Scholar
  61. J. Xu and S. Swanson. NOVA: A log-structured file system for hybrid volatile/non-volatile main memories. In Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST), pages 323--338, Santa Clara, CA, Feb. 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. J. Yang, P. Twohey, D. Engler, and M. Musuvathi. Using model checking to find serious file system errors. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI), pages 273--287, San Francisco, CA, Dec. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. J. Yang, C. Sar, P. Twohey, C. Cadar, and D. Engler. Automatically generating malicious disks using symbolic execution. In Proceedings of the 27th IEEE Symposium on Security and Privacy, pages 243--257, Oakland, CA, May 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. J. Yang, P. Twohey, D. Engler, and M. Musuvathi. eXplode: A lightweight, general system for finding serious storage system errors. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI), pages 131--146, Seattle, WA, Nov. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. M. Zheng, J. Tucek, D. Huang, F. Qin, M. Lillibridge, E. S. Yang, B. W. Zhao, and S. Singh. Torturing databases for fun and profit. In Proceedings of the 11th Symposium on Operating Systems Design and Implementation (OSDI), pages 449--464, Broomfield, CO, Oct. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. Verifying a high-performance crash-safe file system using a tree specification

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SOSP '17: Proceedings of the 26th Symposium on Operating Systems Principles
      October 2017
      677 pages
      ISBN:9781450350853
      DOI:10.1145/3132747

      Copyright © 2017 Owner/Author

      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 14 October 2017

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate131of716submissions,18%

      Upcoming Conference

      SOSP '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader