ABSTRACT
C remains central to our computing infrastructure. It is notionally defined by ISO standards, but in reality the properties of C assumed by systems code and those implemented by compilers have diverged, both from the ISO standards and from each other, and none of these are clearly understood. We make two contributions to help improve this error-prone situation. First, we describe an in-depth analysis of the design space for the semantics of pointers and memory in C as it is used in practice. We articulate many specific questions, build a suite of semantic test cases, gather experimental data from multiple implementations, and survey what C experts believe about the de facto standards. We identify questions where there is a consensus (either following ISO or differing) and where there are conflicts. We apply all this to an experimental C implemented above capability hardware. Second, we describe a formal model, Cerberus, for large parts of C. Cerberus is parameterised on its memory model; it is linkable either with a candidate de facto memory object model, under construction, or with an operational C11 concurrency model; it is defined by elaboration to a much simpler Core language for accessibility, and it is executable as a test oracle on small examples. This should provide a solid basis for discussion of what mainstream C is now: what programmers and analysis tools can assume and what compilers aim to implement. Ultimately we hope it will be a step towards clear, consistent, and accepted semantics for the various use-cases of C.
- Programming Languages — C. 2011.Google Scholar
- ISO/IEC 9899:2011. A non-final but recent version is available at www.open-std.org/jtc1/sc22/wg14/docs/n1539.pdf.Google Scholar
- ANSI. Programming Languages – C: ANSI X3.159-1989. 1989.Google Scholar
- M. Batty, S. Owens, S. Sarkar, P. Sewell, and T. Weber. Mathematizing C++ concurrency. In Proc. POPL, 2011. Google ScholarDigital Library
- P. Becker, editor. Programming Languages — C++. 2011. ISO/IEC 14882:2011.Google Scholar
- A. Bessey, K. Block, B. Chelf, A. Chou, B. Fulton, S. Hallem, C. Henri-Gros, A. Kamsky, S. McPeak, and D. Engler. A few billion lines of code later: using static analysis to find bugs in the real world. Commun. ACM, 53(2), 2010. Google ScholarDigital Library
- F. Besson, S. Blazy, and P. Wilke. A precise and abstract memory model for C using symbolic values. In APLAS, 2014.Google ScholarCross Ref
- F. Besson, S. Blazy, and P. Wilke. A concrete memory model for CompCert. In Proc. ITP, 2015.Google ScholarCross Ref
- H.-J. Boehm and S. Adve. Foundations of the C++ concurrency memory model. In Proc. PLDI, 2008. Google ScholarDigital Library
- G. Canet, P. Cuoq, and B. Monate. A value analysis for C programs. In Proc. 9th IEEE Intl. Working Conference on Source Code Analysis and Manipulation, SCAM ’09, 2009. Google ScholarDigital Library
- D. Chisnall, J. Matthiesen, K. Memarian, K. Nienhuis, P., and R.N.M. Watson. C memory object and value semantics: the space of de facto and ISO standards, 2016. http://www.cl.cam.ac.uk/~pes20/cerberus/.Google Scholar
- D. Chisnall, C. Rothwell, R. N. M. Watson, J. Woodruff, M. Vadera, S. W. Moore, M. Roe, B. Davis, and P. G. Neumann. Beyond the PDP-11: Architectural support for a memory-safe C abstract machine. In Proc. ASPLOS, 2015. Google ScholarDigital Library
- E. Cohen, M. Moskal, S. Tobies, and W. Schulte. A precise yet efficient memory model for C. Electron. Notes Theor. Comput. Sci. (SSV 2009), 254:85–103, October 2009. Google ScholarDigital Library
- J. Cook and S. Subramanian. A formal semantics for C in Nqthm. Technical Report 517D, Trusted Information Systems, October, 1994.Google Scholar
- J. Criswell, N. Geoffray, and V. Adve. Memory safety for low-level software/hardware interactions. In Proceedings of the Eighteenth Usenix Security Symposium, August 2009. Google ScholarDigital Library
- J. Devietti, C. Blundell, M. M. K. Martin, and S. Zdancewic. Hardbound: Architectural support for spatial safety of the C programming language. In Proc. ASPLOS, 2008. Google ScholarDigital Library
- C. Ellison and G. Ro¸su. An executable formal semantics of C with applications. In Proc. POPL, 2012. Google ScholarDigital Library
- Y. Gurevich and J. K. Huggins. The semantics of the C programming language. In Proc. CSL ’92, 1993. Google ScholarDigital Library
- C. Hathhorn, C. Ellison, and G. Rosu. Defining the undefinedness of C. In Proc. PLDI, 2015. Google ScholarDigital Library
- Charles McEwen Ellison III. A Formal Semantics of C with Applications. PhD thesis, University of Illinois at Urbana-Champaign, 2012.Google Scholar
- Runtime Verification Inc. Rv-match v0.1. https://runtimeverification.com/match/download/, 2016. Downloaded 2016-03-11.Google Scholar
- Intel Plc. Introduction to Intel memory protection extensions, July 2013.Google Scholar
- T. Jim, J. G. Morrisett, D. Grossman, M. W. Hicks, J. Cheney, and Y. Wang. Cyclone: A safe dialect of C. In Proc. USENIX ATC, 2002. Google ScholarDigital Library
- J. Kang, C.-K. Hur, W. Mansky, D. Garbuzov, S. Zdancewic, and V. Vafeiadis. A formal C memory model supporting integer-pointer casts. In Proc. PLDI, 2015. Google ScholarDigital Library
- B. W. Kernighan and D. M. Ritchie. The C Programming Language (ANSI C). Prentice Hall, 2nd edition, 1988. Google ScholarDigital Library
- R. Krebbers. Aliasing restrictions of C11 formalized in Coq. In Proc. CPP, LNCS 8307, 2013. Google ScholarDigital Library
- R. Krebbers. An operational and axiomatic semantics for nondeterminism and sequence points in C. In Proc. POPL, 2014. Google ScholarDigital Library
- R. Krebbers. The C standard formalized in Coq. PhD thesis, Radboud University Nijmegen, December 2015.Google Scholar
- R. Krebbers and F. Wiedijk. Separation logic for non-local control flow and block scope variables. In FoSSaCS, 2013. Google ScholarDigital Library
- R. Krebbers and F. Wiedijk. A typed C11 semantics for interactive theorem proving. In Proc. CPP, 2015. Google ScholarDigital Library
- X. Leroy, A. W. Appel, S. Blazy, and G. Stewart. The CompCert memory model, version 2. Research report RR-7987, INRIA, June 2012.Google Scholar
- X. Leroy and S. Blazy. Formal verification of a C-like memory model and its uses for verifying program transformations. Journal of Automated Reasoning, 41(1):1–31, 2008. Google ScholarDigital Library
- K. Memarian, J. Matthiesen, K. Nienhuis, V. B. F. Gomes, J. Lingard, D. Chisnall, R. N. M. Watson, and P. Sewell. Cerberus, 2016.Google Scholar
- www.cl.cam.ac.uk/~pes20/cerberus, www.repository.cam.ac.uk/handle/1810/255730.Google Scholar
- D. P. Mulligan, S. Owens, K. E. Gray, T. Ridge, and P. Sewell. Lem: reusable engineering of real-world semantics. In Proc. ICFP, 2014. Google ScholarDigital Library
- S. Nagarakatte, J. Zhao, M. M.K. Martin, and S. Zdancewic. SoftBound: highly compatible and complete spatial memory safety for C. In Proc. PLDI, 2009. Google ScholarDigital Library
- G. C. Necula, S. McPeak, and W. Weimer. CCured: type-safe retrofitting of legacy code. In Proc. POPL, 2002. Google ScholarDigital Library
- N. Nethercote and J. Seward. Valgrind: A framework for heavyweight dynamic binary instrumentation. In PLDI, 2007. Google ScholarDigital Library
- K. Nienhuis, K. Memarian, and P. Sewell. An operational semantics for C/C++11 concurrency, 2016. Draft available at www.cl.cam.ac.uk/~pes20/cerberus.Google Scholar
- M. Norrish. C formalised in HOL. Technical Report UCAMCL-TR-453, U. Cambridge, Computer Laboratory, 1998.Google Scholar
- M. Norrish. Deterministic expressions in C. In ESOP, 1999. Google ScholarDigital Library
- N. S Papaspyrou. A formal semantics for the C programming language. PhD thesis, National Technical University of Athens, 1998.Google Scholar
- F. Pottier and Y. Régis-Gianas. Menhir. gallium.inria.fr/~fpottier/menhir/.Google Scholar
- J. Regehr. blog.regehr.org/archives/721, 2012.Google Scholar
- J. Regehr, April 2015. blog.regehr.org/archives/1234.Google Scholar
- TrustInSoft. trust-in-soft.com/tis-interpreter, 2015.Google Scholar
- H. Tuch, G. Klein, and M. Norrish. Types, bytes, and separation logic. In Proc. POPL, 2007. Google ScholarDigital Library
- J. Ševˇcík, V. Vafeiadis, F. Zappa Nardelli, S. Jagannathan, and P. Sewell. CompCertTSO: A verified compiler for relaxedmemory concurrency. J. ACM, 60(3), June 2013. Google ScholarDigital Library
- X. Wang, H. Chen, A. Cheung, Z. Jia, N. Zeldovich, and M. F. Kaashoek. Undefined behavior: what happened to my code? In Proc. APSYS, 2012. Google ScholarDigital Library
- X. Wang, N. Zeldovich, M. F. Kaashoek, and A. Solar-Lezama. Towards optimization-safe systems: Analyzing the impact of undefined behavior. In Proc. SOSP, 2013. Google ScholarDigital Library
- R. N. M. Watson, P. G. Neumann, J. Woodruff, M. Roe, J. Anderson, D. Chisnall, B. Davis, A. Joannou, B. Laurie, S. W. Moore, S. J. Murdoch, and R. Norton. Capability hardware enhanced RISC instructions: CHERI instruction-set architecture. Technical Report UCAM-CL-TR-864, University of Cambridge, Computer Laboratory, November 2015.Google Scholar
- R. N. M. Watson, J. Woodruff, P. G. Neumann, S. W. Moore, J. Anderson, D. Chisnall, N. H. Dave, B. Davis, K. Gudka, B. Laurie, S. J. Murdoch, R. Norton, M. Roe, S. Son, and M. Vadera. CHERI: A hybrid capability-system architecture for scalable software compartmentalization. In IEEE Symposium on Security and Privacy, SP, 2015. Google ScholarDigital Library
- WG14. Defect report summary for ISO/IEC 9899:1999.Google Scholar
- WG14. ISO/IEC 9899:2011.Google Scholar
- WG14. Defect report 260, September 2004.Google Scholar
- www.open-std.org/jtc1/sc22/wg14/www/docs/dr_260.htm.Google Scholar
- J. Woodruff, R. N. M. Watson, D. Chisnall, S. W. Moore, J. Anderson, B. Davis, B. Laurie, P. G. Neumann, R. Norton, and M. Roe. The CHERI capability model: revisiting RISC in an age of risk. In ISCA, 2014. Google ScholarDigital Library
- X. Yang, Y. Chen, E. Eide, and J. Regehr. Finding and understanding bugs in C compilers. In Proc. PLDI, 2011. Google ScholarDigital Library
- J. Zhao, S. Nagarakatte, M. M.K. Martin, and S. Zdancewic. Formalizing the LLVM intermediate representation for verified program transformations. In Proc. POPL, 2012. Google ScholarDigital Library
Index Terms
Into the depths of C: elaborating the de facto standards
Recommendations
Into the depths of C: elaborating the de facto standards
PLDI '16C remains central to our computing infrastructure. It is notionally defined by ISO standards, but in reality the properties of C assumed by systems code and those implemented by compilers have diverged, both from the ISO standards and from each other, ...
ANSI's role in standards development
ANSI is the American National Standards Institute, the US national coordinating institute for voluntary standards. Typically, you will see ANSI in a standard's name followed by some letters and numbers such as X3.181-1980. These characters represent the ...
Influences on standards adoption in de facto standardization
Special issue on New Theories and Methods for Technology Adoption ResearchIn the IT industry, de facto standards emerge from standards competition as firms offer incompatible technologies, and user choices determine the outcome of the competition. The standards literature suggests that strong network effects create a bias ...
Comments