==========================================================

Document: N2223

Related:

WG14 Pittsburgh meeting, 2016-10

WG14 London meeting, 2016-04

1 Introduction

We have been working for some time to clarify the memory object model of C: the behaviour of pointer operations, uninitialised values, padding, effective types, and so on. These have been vexed questions for many years. C values cannot be treated as simple abstract or concrete entities: the language exposes their representations, but compiler optimisations rely on analyses that reason about provenance and initialisation status, not just runtime representations. The current ISO standard some aspects of this unclear, and in some aspects conflicts with de facto standard usage - which itself is difficult to investigate.

We have subdivided the problem into six areas, covering subsets of the questions from our London 2016-04 WG14 N2013. For the first four we provide notes for the Brno 2018-04 WG14 meeting, the first three of which are revisions of our N2089, N2090, N2091 from the Pittsburgh 2016-10 meeting.

In several areas there are multiple desirable semantic options, e.g. for C with and without the mainstream implementation -fno-strict-aliasing flag, which has a profound effect on the language. The ISO standard has historically not defined such options (beyond the current implementation-defined aspects), but we believe that in the next major version it should define the major options. We identify some of them (some already existing, some new) in these notes. For some, one might want it to be implementation-defined which options are supported. This also provides a way of defining common "safer C" dialects, by a suitable combination of options.

Our questions and proposals are informed by our previous surveys of C experts, by extensive discussion with many people, and by our Cerberus project work, in which we are developing a rigorous semantic model for a substantial fragment of C.

Our model covers many features of C, both syntactic and semantic, but to keep the problem manageable we exclude some important aspects. We do not currently attempt to cover preprocessor features, C11 character-set features, floating-point and complex types (beyond simple float constants), user-defined variadic functions (we do cover printf), bitfields, volatile, restrict, generic selection, register, flexible array members, some exotic initialisation forms, signals, longjmp, multiple translation units, most of the standard library, or concurrency. We focus on the C commonly used for mainstream systems programming without effective types, with -fno-strict-aliasing. We are not trying to cover every conceivable C implementation, or those on exotic architectures. Our semantics is intended as a source semantics for C. For an intermediate language, e.g. LLVM, there are related but distinct goals. Our implementation and web tool are intended to explore the behaviour of small test cases, which is already challenging, not as a bug-finding tool for production C code, which would need considerably higher performance and coverage. Finally, while we have done significant testing and validation, more would always be desirable; Cerberus is work in progress, not a completely finished system.