Reply-to: David Goldblatt (davidtgoldblatt@gmail.com, davidgoldblatt@fb.com)

Document no: WG14 / N2293

Date: 2018-09-16

N2293: Alignment requirements for memory management functions

Summary

The alignment requirements of malloc, calloc, and realloc are somewhat confusingly phrased, in a way that affects small allocations (sizes less than _Alignof(max_align_t)). Some readers (and implementations) interpret them to demand _Alignof(max_align_t)-alignment even for allocation sizes that could not hold an object with that alignment. We call this the "strong-alignment" reading. Other readers (and implementations) interpret them as requiring the returned memory to be aligned only enough to accommodate those types that could inhabit the returned memory. In particular, since sizeof(T) >= _Alignof(T) for all portably defined types T, allocations with sizes smaller than _Alignof(max_align_t) need only be aligned to the largest power of two less than or equal to the requested size. We call this the "weak-alignment" reading.

Many implementations only provide weak-alignment guarantees, and some cases these guarantees are important in conserving both CPU and memory. Therefore, we propose clarifying the wording such that the weak-alignment implementation is unambiguously allowed. Strong-alignment implementations remain correct, and may document the guarantee as an extension.

The current wording

C17, 7.22.3 states: "The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object with a fundamental alignment requirement and then used to access such an object or an array of such objects in the space allocated".

The two clauses of interest here are "it may be assigned to a pointer to any type of object with a fundamental alignment requirement" and "and then used to access such an object or an array of such objects in the space allocated". Strong-alignment readers tend to see both clauses as independent, whereas weak-alignment readers tend to see the second as an adverbial clause modifying the first. Neither reading is completely satisfactory. The strong-alignment view implies an independent clause with a tense mismatch or a missing verb; the weak-alignment view uses "and then" when "then" would suffice, which could imply a sequencing at odds with the interpretation.

In the author's view, attempting to declare one of these two parsings correct is a waste of time. Many different implementers, fluent in English and acting in good faith, have parsed the sentence both ways. The question is how to reconcile the divergence.

Implementations and implementer opinions

The author has experimented with and inspected the source code for as many different implementations as possible. These are his best guesses based on consultations with maintainers and the observed behavior of implementations (if possible), or source-code inspection alone (if not).

Strong-alignment environments:

Weak-alignment environments:

I also reached out to those maintainers with whom I could find a mutual professional connection:

(There's no filtering of responses here; every group of maintainers I was able to contact is included, except for one that was unable to obtain legal permission to comment publicly).

This is not the first time this vagueness has been addressed; the response to DR75 endorses the strong-alignment interpretation. However, that response never become normative (via inclusion in a technical corrigendum or international standard). Moreover, it was written before the introduction of the notion of fundamental alignments, and thereby implies that all returned pointers must be suitably aligned for all types; including vector types, types with implementation-specific alignment specifiers, etc. Most compilers with which the author is familiar never provided such a strong guarantee in between DR75 and C11.

Proposed wording

We propose rewording so that the weak-alignment interpretation is specified, and replacing:

"The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object with a fundamental alignment requirement and then used to access such an object or an array of such objects in the space allocated"

with

"The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object with a fundamental alignment requirement and size less than or equal to the size requested. It may then be used to access such an object or an array of such objects in the space allocated"

This makes all the implementations listed in the previous section conforming.

In subsequent sections, we'll argue that this change does not induce code breakage, and that changing the weak-alignment implementations to be strong-alignment ones would cause significant performance regressions.

Code breakage

A strong-alignment reader might argue that the proposed rewording breaks existing code; alignment assumptions that they read as holding in all conforming implementations now only hold given an implementation-level extension. However, even granting their parsing of the wording, this guarantee was only ever true in theory, not in practice. The number of weak-alignment implementations meant that the code in question was never meaningfully portable except to those implementations that opted-in in the first place.

Nevertheless, in an attempt to quantify the amount of such non-portable code, I examined the FreeBSD ports project. Many (over 30,000, though not all of them written in C) third-party programs are available for installation on FreeBSD systems. Because many of these programs are developed primarily on other operating systems, the ports project includes the patches required to allow projects to work on FreeBSD. If a program uses the strong-alignment assumption, we would expect to see a patch involving malloc, realloc, calloc, or max_align_t in the ports project, tweaking the size argument (since FreeBSD's libc malloc is weak-alignment). Manual inspection of all instances of these strings in these patches indicated that this was not the case.

However, it may still be useful to consider a few specific scenarios:

In summary: code that requires strong-alignment assumptions for correctness is rare enough that the above search was unable to find it. Any such code that does exist relies on platform-dependent behavior in practice if not in theory, and therefore is not any more or less portable with the proposed rewording than in the status quo.

Performance considerations

Another way of dealing with the split would be to clarify in the strong-alignment direction, and demand that the weak-alignment implementations change their behavior. However, doing so would impose nontrivial costs on those implementations. The weak-alignment interpretation has a number of performance advantages over the strong-alignment one.

Strong-alignment implementations are also valid weak-alignment implementations, and therefore an implementation that obtains performance improvements from a strong-alignment guarantee can of course continue to provide them.

Note that most compiler-level optimizations don't benefit from assuming strong-alignment. The optimizer cannot in general assume that a pointer it sees came from malloc unless it sees it returned from malloc or passed to free (and, in the former case, it's free to expand the size of the allocation if desired). We are concerned only with small allocations here, so having a small alignment cost multiplied by iteration through a loop is not a concern. Moreover, situations in which an operation on a N/2-byte object can be sped up when it is known that the object is N-byte aligned are relatively uncommon.

To attempt to quantify the possible regression, I measured the performance changes on an allocator that can be configured to provide either the strong or the weak alignment guarantees (jemalloc), on two large (heaps in the 10s of GBs) proprietary server processes. Each showed a 1.1% regression in resident set size, and 0.7% and 0.8% regressions in CPU consumption.

To get numbers based only on publicly available data, I measured GCC 4.8.5's runtime and memory consumption when compiling GCC 4.0.0's sources concatenated into a single file (the former version being chosen because it is the default compiler provided by my operating system, and the latter because of its easy availability at https://people.csail.mit.edu/smcc/projects/single-file-programs/), and measured runtime and memory consumption. These numbers are for three runs, after an initial “burn-in” run for both versions:

Comparing best-to-best run in each category gives a 2.5% runtime increase and 1.7% RSS increase from enforcing strong alignment. Median-to-Median gives a 4.3% runtime increase and 2.0% RSS increase. Worst-to-worst gives a 4.2% runtime increase and 0.04% RSS increase.

Conclusion

Regardless of whether or not strong-alignment is required in the Platonic reading of C as standardized, it is not provided in C as implemented. Clarifying the wording in the strong-alignment direction imposes performance regressions on existing implementations. Leaving the wording unchanged confuses users and implementers. Clarifying the wording in the weak-alignment direction avoids these costs, and does not change the portability of programs in practice.