N2827
Modern Bit Utilities

Published Proposal,

Previous Revisions:
None
Author:
Paper Source:
GitHub
Issue Tracking:
GitHub
Project:
ISO/IEC JTC1/SC22/WG14 9899: Programming Language — C
Proposal Category:
Change Request
Target:
General Developers

Abstract

Endian preprocessor macros, endian enumerations, byte swapping, big endian / little endian functions, and several bit utilities that have become commonplace amongst compilers, bytecodes, and implementations.

1. Changelog

1.1. Revision 0 - October 15th, 2021

2. Introduction & Motivation

There is a lot of proposals and work that goes into figuring out the "byte order" of integer values that occupy more than 1 octet (8 bits). This is nominally important when dealing with data that comes over network interfaces and is read from files, where the data can be laid out in various orders of octets for 2-, 3-, 4-, 6-, or 8-tuples of octets. The most well-known endian structures on existing architectures include "Big Endian", where the least significant bit comes "last" and is featured prominently in network protocols and file protocols; and, "Little Endian", where the least significant bit comes "first" and is typically the orientation of data for processor and user architectures most prevalent today.

In more legacy architectures (Honeywell, PDP), there also exists other orientations called "mixed" or "middle" endian. The uses of such endianness are of dubious benefit and are vanishingly rare amongst commodity and readily available hardware today, but nevertheless still represent an applicable ordering of octets.

In other related programming interfaces, the C functions/macros ntoh ("network to host") and hton ("host to network") (usually suffixed with l or ul or others to specify which native data type it was being performed on such as long) were used to change the byte order of a value ([ntohl]). This became such a common operation that many compilers - among them Clang and GCC - optimized the code down to use an intrinsic __builtin_bytewap(...)/__builtin_bswap(...) (for MSVC, for Clang, and for GCC). These intrinsics often compiled into binary code representing cheap, simple, and fast byte swapping instructions available on many CPUs for 16, 32, 64, and sometimes 128 bit numbers. The bswap/byteswap intrinsics were used as the fundamental underpinning for the ntoh and hton functions, where a check for the translation-time endianness of the program determined if the byte order would be flipped or not.

This proposal puts forth the fundamentals that make a homegrown implementation of htonl, ntoh, and other endianness-based functions possible in Standard C code. It also addresses many of the compiler-based intrinsics found to generate efficient machine code, with a few simpler utilities layered on top of it.

3. Design

This is a library addition. It is meant to expose both macros and enumeration values that can be used for both translation-time checks and for execution-time argument for endianness. It provides a way to check the endianness within the preprocessor, and gives definitive names that allow for knowing whether the endianness is big, little, or neither. We state big, little, or neither, because there is no settled-upon name for the legacy endianness of "middle" or "mixed", nor any agreed upon ordering for such a "middle" or "mixed" endianness between architectures. This is not the case for big endian or little endian, where one is simply the reverse of the other, always, in every case, across architectures, file protocols, and network specifications.

This design also provides a small but essential suite of bit utilities, all within the #include <stdbit.h> header.

3.1. Preliminary: Why the stdc_ prefix?

We use the stdc_ prefix for these functions so that we do not have to struggle with taking common words away from the end user. Because we now have 31 bytes of linker name significance, we can afford to have some sort of prefix rather than spend all of our time carving out reserved words or header-specific extensions. This will let us have good names that very clearly map to industry practice, without replacing industry code or being forced to be compatible with existing code that already has taken the name with sometimes-conflicting argument conventions.

3.2. Charter: unsigned char const ptr[static sizeof(uintN_t)] and More?

There are 2 choices on how to represent sized pointer arguments. The first is a void* ptr convention for functions arguments in this proposal. The second is an unsigned char ptr[static n]/unsigned char ptr[sizeof(uintN_t)] convention.

To start, we still put any size + ptr arguments in the proper "size first, pointer second" configuration so that implementation extensions which allow void [static n] can exist no matter what choice is made here. That part does not change. The void* argument convention mean that pointers to structures, or similar, can be passed to these functions without needing a cast. This represents the totality of the ease of use argument. The unsigned char ptr[static n] argument convention can produce both better compile-time safety and articulate requirements using purely the function declaration, without needing to look up prose from the C Standard or implementation documentation. The cost is that any use of the function will require a cast in strictly conforming code.

One of the tipping arguments in favor of our choice of unsigned char ptr[static n] is that void* can be dangerous, especially since we still do not have a nullptr constant in the language and 0 can be used for both the size and the pointer argument. (Which is, very sadly, an actual bug that happens in existing code. Especially when users mix memset and memcpy calls and use the wrong 0 argument because of writing one and meaning the other, and copying values over a large part of their 0-pointer in their low-level driver code.) Using an unsigned char* (or its statically-sized array function argument form) means that usage of the functions below would require explicit casting on the part of the user. This is, in fact, the way it is presented in [portable-endianness]: as far as existing practice is concerned, users of the code would rather cast and preserve safety rather than easily use something like stdc_memreverse with the guts of their structure.

3.3. The Endianness Enumeration

The enumeration is specified as follows:

#include <stdbit.h>

#define __STDC_ENDIAN_LITTLE__ /* some unique value */
#define __STDC_ENDIAN_BIG__ /* some other unique value */
#define __STDC_ENDIAN_NATIVE__ /* see below! */

typedef enum stdc_endian {
	stdc_endian_little = __STDC_ENDIAN_LITTLE__,
	stdc_endian_big = __STDC_ENDIAN_BIG__,
	stdc_endian_native = __STDC_ENDIAN_NATIVE__
} stdc_endian;

The goal of this enumeration is that if the system identifies as a "little endian" system, then __STDC_ENDIAN_LITTLE__ == __STDC_ENDIAN_NATIVE__, and that is how an end-user knows that the implementation is little endian. Similarly, a user can check __STDC_ENDIAN_BIG__ == __STDC_ENDIAN_NATIVE__, and they can know the implementation is big endian. Finally, if the system is neither big nor little endian, than __STDC_ENDIAN_NATIVE__ is a unique value that does not compare equal to either value:

#include <stdbit.h>
#include <stdio.h>

int main () {
	if (stdc_endian_native == stdc_endian_little) {
		printf("little endian! uwu\n");
	}
	else if (stdc_endian_native == stdc_endian_big) {
		printf("big endian OwO!\n");
	}
	else {
		printf("what is this?!\n");
	}
	return 0;
}

If a user has a Honeywell architecture or a PDP architecture, it is up to them to figure out which flavor of "middle endian"/"mixed endian"/"bi endian" they are utilizing. We do not give these a name in the enumeration because neither the Honeywell or PDP communities ever figured out which flavor of the 32-bit byte order of 2341/3412/2143/etc. was strongly assigned to which name ("mixed" endian? "mixed-big" endian? "bi-little" endian?), and since this is not a settled matter in existing practice we do not provide a name for it in the C Standard. It is also of dubious determination what the byte order for a 3-byte, 5-byte, 6-byte, or 7-byte integer is in these mixed-endian types, whereas both big and little have dependable orderings.

These same enumerations come from the (accepted) C++20 paper and idioms found in [p0463], which also went into a <bit> header. Similar ideas are also present in libraries such as [libcork-byte-order], which are hybrid C and C++ libraries that give definitions similar to the ones here. Compilers also define macros such as __BYTE_ORDER__ (Clang/GCC family), or are well-defined to be a certain endianness (Windows is always little-endian).

The other portion of this is that providing an enumeration helps users pass this information along to functions. Users defining functions that take an endianness, without the enumeration, would define it as so:

void my_conversion_unsafe(int endian, size_t data_size,
	unsigned char data[static data_size]);

The name may specify that it is for an endian, but the range of values is not really known without looking at the documentation. It is also impossible for the compiler to diagnose problematic uses: calling my_conversion(4595944, 4, ptr); is legal, and compilers will not diagnose such a call as wrong. Now, consider the same with the enumeration:

void my_conversion_safe(stdc_endian endian, size_t data_size,
	unsigned char data[static data_size]);

This function call can get diagnosed in (some) implementations:

#include <stddef.h>

typedef enum stdc_endian {
	stdc_endian_little = __ORDER_LITTLE_ENDIAN__,
	stdc_endian_big = __ORDER_BIG_ENDIAN__,
	stdc_endian_native = __BYTE_ORDER__,
} stdc_endian;

void my_conversion_unsafe(int endian, size_t n, unsigned char ptr[static n]) {}
void my_conversion_safe(stdc_endian endian, size_t n, unsigned char ptr[static n]) {}

int main () {
	unsigned char arr[4];
	my_conversion_unsafe(48558395, sizeof(arr), arr);
	my_conversion_safe(48558395, sizeof(arr), arr);
	//                 ^
	// <source>:15:24: error: integer constant not in range 
	// of enumerated type 'stdc_endian' (aka 'enum stdc_endian') [-Werror,-Wassign-enum]
	my_conversion_unsafe((stdc_endian)48558395, sizeof(arr), arr);
	my_conversion_safe((stdc_endian)48558395, sizeof(arr), arr);
	return 0;
}

(Many current implementations do not diagnose it in the current landscape because such implicit conversions are, unfortunately, incredibly common, sometimes for good reason.)

3.3.1. A (Brief) Discussion of Endianness

There is a LOT of design space and deployed existing practice in the endianness space of both architectures and their instruction sets. A non-exhaustive list of behaviors is as follows:

Suffice to say, there exists a lot of deployed practice. Note that this list effectively has these concerns in priority order. The first is the most conventional software; as the list goes down, each occurrence becomes more rare and less interesting. Therefore, we try not to spend too much time focusing on what are effectively the edge cases of software and hardware. Some of the past choices in endianness and similar were simply due "going with the flow" (PDP’s "2143" order) or severe historical baggage (early FORTRAN dealing in big endian floating point numbers, and those algorithms and serialization methods being given to PDP machines without thinking about the ordering). With much of the industry moving away from such modes in both newer mainframes and architectures and towards newer implementations and architectures, it does not seem prudent to try to standardize the multitude of their behaviors.

This proposal constraints its definition of endianness to integer types without padding, strictly because trying to capture the vast berth of existing architectures and their practices can quickly devolve down a slope that deeply convolutes this proposal’s core mission: endian and bit utilities.

3.3.2. Hey! Some Architectures Can Change Their Endianness at Run-time!

This is beyond the scope of this proposal. This is meant to capture the translation-time endianness. There also does not appear to be any operating system written today that can tolerate an endianness change happening arbitrarily at runtime, after a program has launched. This means that the property is effectively a translation-time property, and therefore can be exposed as a compile-time constant. A future proposal to determine the run-time byte order is more than welcome from someone who has suitable experience dealing with such architectures and programs, and this proposal does not preclude their ability to provide such a run-time function e.g. stdc_endian get_execution_endian(void);.

Certain instruction sets have ways to set the endianness of registers, to change how data is accessed ([arm-setend]). This functionality is covered by byte swapping, and byte swaps can be implemented using the SETEND instruction plus an access. (The compiler would have to remember to unwind the endian state back to its original value, however, or risk contaminating the entire program and breaking things.)

3.3.3. Floating Point has a Byte Order, Too.

For the design of this paper, we strictly consider the design space for (unsigned) integers, only. Floating point numbers already have an implementation-defined byte order, and none of these functions are meant to interact with the floating point types. While the stdc_memreverse function can work on any memory region, which includes any structure, scalar, or similar type with or without padding bits, the function just swaps bytes. Nothing needs to be said about padding bits in this case, since the operation is well-defined in all cases.

It shall be noted that for C++, since C++20, it’s endian enumeration applies to all scalar types:

This subclause describes the endianness of the scalar types of the execution environment.

— C++ Standard Working Draft, bit.endian/p1

It does not specify what this means for padding bits or similar; nor, I think, does it have to. Byte order means very little for padding bits until serialization comes into play. C++ does not define any functions which do byte-order aware serialization. So, it does not have to write any specification governing what may or may not happen and the left is rest undefined / unspecified.

For this proposal, we focus purely on integer types and, more specifically, on integer types which do not have padding when we are defining the actual functions. While it is acknowledged that floating point types and pointers have byte orders too, we do not want to interact directly with these types when it comes to endianness load and store functions. Byte swaps, (bit) population counts, and other bit operations can be performed on floating point types after they have been copied or type-punned (with implementation checking/blessing) into equivalent unsigned integer objects to do the necessary work.

3.4. Generic Byte Swap and Exact-width Byte Swap

In order to accommodate both a wide variety of architectures but also support minimum-width integer optimized intrinsics, this proposal takes from the industry 2 forms of byteswap:

These end up inhabiting the stdbit.h header and have the following interface:

#include <stdbit.h>

void stdc_memreverse(size_t n, unsigned char ptr[static n]);
uintN_t stdc_byteswapuN(uintN_t value);

where N is one of the minimum-width integer types such as 8, 24, 16, 32, 64, 128, and others. On most architectures, this matches the builtins (MSVC, Clang, GCC) and the result of compiler optimizations that produce instructions for many existing architectures as shown in the README of this portable endianness function implementation. In the case where the least functions are not equivalent to the exact-width integer types, the specification requires that it only works with N bits of the uintN_t type, so as to proscribe portable behavior in the face of vastly fluctuating architectures. However, for the case where uintN_t is not exactly N bits, it is implementation-defined WHICH bits are untouched/unused in the uintN_t,

3.4.1. But Byte Swap Is Dangerous?

Byte swapping, by itself, is absolutely dangerous in terms of code portability. Users often program strictly for their own architecture when doing serialization, and do not take into consideration that their endianness can change. This means that, while byteswap functions can compile down to intrinsics, those intrinsics get employed to change "little endian" to "big endian" without performing the necessary "am I already in the right endianness" check. Values that are already in the proper byte order for their target serialization get swapped, resulting in an incorrect byte order for the target network protocol, file format, or other binary serialization target.

The inclusion of the <stdbit.h> header reduces this problem, but does not fully eliminate it. This is why many Linux and BSDs include functions which directly transcribe from one endianness to another. This is why the Byte Order Fallacy has spread so far in Systems Programming communities, and why many create their own versions of this both in official widespread vendor code ([linux-endian]) and in more personal code used for specific distributions ([portable-endianness]).

3.5. Endian-Aware Load/Store Functions

Functions meant to transport bytes to a specific endianness need 2 pieces of information:

To represent any operation that goes from/to the byte order that things like long longs are kept in, the Linux/BSD/etc. APIs use the term "host", represented by h. Every other operation is represented by explicitly naming it, particularly as be or le for "big endian" or "little endian". Again, because of the severe confusion that comes from what the exact byte order a "mixed endian" multi byte scalar is meant to be in, there seems not to exist any widely available practice regarding what to call a PDP/Honeywell endian configuration. Therefore, mixed/bi/middle-endian is not included in this proposal. It can be added at a later date if the community ever settles on a well-defined naming convention that can be shared between codebases, standards, and industries.

The specification for the endianness functions borrows from many different sources listed above, and is as follows:

#include <stdbit.h>

void stdc_store_leuN(uint_leastN_t value, unsigned char ptr[static sizeof(value)]);
void stdc_store_beN(uint_leastN_t value, unsigned char ptr[static sizeof(value)]);
uint_leastN_t stdc_load_leuN(const unsigned char ptr[static sizeof(value)]);
uint_leastN_t stdc_load_beuN(const unsigned char ptr[static sizeof(value)]);

This specification is marginally more complicated than the stdc_byteswapuN functions because they operate on uint_leastN_t, where N is the minimum-width bit value. These functions, on most normal implementations, will just fill in the exact number of 8, 16, 32, 64, etc. bits. But for Digital Signal Processors (DSPs), select embedded architectures, and many freestanding implementations, it is impossible to offer a CHAR_BIT == 8 guarantee. For example, some Digital Signal Processors have CHAR_BIT == 32, and all of uint_least8_t, uint_least16_t, uint_least24_t, and uint_least32_t are all aliased to the same fundamental type. This function, therefore, specifies it works on 8 bit groupings, and uses masks/shifts and similar ability to work properly. This should produce consistent results across implementations. Finally, the final value of the uint_leastN_t in cases where the width of the type has padding beyond the N bits, is implementation-defined as the padding bits are initialized to 0 and the skipped / shifted bits are different.

3.6. Modern Bit Utilities

Additionally to this, upon first pre_review of the paper there was a strong supporting groundswell for bit operations that have long been present in both hardware and as compiler intrinsics. This idea progressed naturally from the bswap and __builtin_bswap discussion. As indicated in [p0553] (merged into C++20 already), here’s a basic rundown of some common architectures and their support for various bit functionality:

operation Intel/AMD ARM PowerPC
rotl ROL - rldicl
rotr ROR ROR, EXTR -
popcount POPCNT - popcntb
leading_zero BSR, LZCNT CLZ cntlzd
leading_one - CLS -
trailing_zero BSF, TZCNT - -
trailing_one - - -

Many of the below bit functions are defined below to ease portability to these architectures. For places where specific compiler idioms and automatic detection are not possible, similar assembly tricks or optimized implementations can be provided by C. Further bit functions were also merged into C++, resulting in the current state of the C++ bit header. We try to take the most useful subset of these functions that most closely represent functionality on both old and new CPU architectures as well as common, necessary operations that have been around in the last 25 years for various industries.

3.6.1. "Why not only generic interfaces or (u)intmax_t interfaces?

For many of the bit-based utilities, you will see it introduces functions with several suffixes for the various types. Often, it is asked: why? Even the GCC builtins for things like popcount only take long and long long. The answer is in the blank spaces in the table above: for architectures that do not have perfect instruction mappings for a given built-in type (e.g., ARM for popcount), the amount of bits one is utilizing for the given function is actually incredibly important. There is a difference between counting for 8 bits in a loop and counting 64 bits (or larger for extended integer types), so the various forms are provided to allow implementations to produce the most efficient implementation on their platforms when the user requests a specific size.

The generic interfaces can be used by individuals who want automatic selection of the best. And, as shown in the § 6 Appendix, platforms can use any builtins or techniques at their disposal to select an appropriate builtin, instruction, or function call to fit the use case.

3.6.2. popcount

popcount (Population Count) is an older computer science term taken from the statistics / biology nomenclature to indicate how many bits are set within a grouping. It’s a very useful instruction with applications in everything from game development to scientific computing. It is also directly provided by many instruction sets. The API for it is as such:

#include <stdbit.h>

int stdc_popcountuc(unsigned char value);
int stdc_popcountus(unsigned short value);
int stdc_popcountui(unsigned int value);
int stdc_popcountul(unsigned long value);
int stdc_popcountull(unsigned long long value);

// type-generic macro
int stdc_popcount(generic_integer_type value);

It covers all of the built-in unsigned integer types. The type-generic macro supports all of the built-in types as well as any of the implementation-defined extended integer types. See the appendix for an implementation.

3.6.3. rotate_left/rotate_right

rotate_left/rotate_right are common CPU instructions and the forms of the commonly-used circular shifts. They are common operations with applications in cyclic codes. They are commonly expressed (for 32-bit numbers) as value << count | value >> (32 - count) (rotate left) or value >> count | value << (32 - count) (rotate right).

#include <stdbit.h>

unsigned char stdc_rotate_leftuc(unsigned char value, int count);
unsigned short stdc_rotate_leftus(unsigned short value, int count);
unsigned int stdc_rotate_leftui(unsigned int value, int count);
unsigned long stdc_rotate_leftul(unsigned long value, int count);
unsigned long long stdc_rotate_leftull(unsigned long long value, int count);

unsigned char stdc_rotate_rightuc(unsigned char value, int count);
unsigned short stdc_rotate_rightus(unsigned short value, int count);
unsigned int stdc_rotate_rightui(unsigned int value, int count);
unsigned long stdc_rotate_rightul(unsigned long value, int count);
unsigned long long stdc_rotate_rightull(unsigned long long value, int count);

// type-generic macro
generic_integer_type stdc_rotate_left(generic_integer_type value, int count);
generic_integer_type stdc_rotate_right(generic_integer_type value, int count);

They cover all of the built-in unsigned integer types. Note that count is a signed integer! If (e.g.) stdc_rotate_leftuc(1, -1) is called, it will call itself again with stdc_rotate_rightuc(value, -count); if (e.g.) stdc_rotate_rightuc(1, -1) is called, it will call itself again with stdc_rotate_leftuc(value, -count). This matches the behavior from C++ and avoids undefined behavior, while also avoiding too-large shift errors from signed-to-unsigned conversions.

SDCC and several other compilers optimize for left and right shifts ([sdcc]). Texas Instruments and a handful of other specialist architectures also have "variable shift" instructions (SSHVL), which uses the sign of the argument to shift in one direction or the other ([ti-tms320c64x]). Having a rotate_left where the a negative number produces the opposite rotate_right cyclic operation (and vice-versa) means that both of these architectures can optimize efficiently in the case of hardcoded constants, and still produce well-defined behavior otherwise (SSHVL instructions just deploy a "negated by default" for the count value or not, depending on whether the left or right variant is called, other architectures propagate the information to shift left or right.)

3.6.4. leading_zeroes, leading_ones, trailing_zeroes, and trailing_ones

leading_zeroes, leading_ones, trailing_zeroes, and trailing_zeroes are semi-common CPU instruction for counting the number of zeroes/ones from the most significant bit ("leading") and the least significant bit ("trailing"). C++ adopted this one using the names of the form count(l|r)_(zero|one). The l/r stand for "left" and "right". C++ uses left to match the concept of the left hand side of integers in lexical parsing and left shift operators in C an C++. We choose "leading" and "trailing" here as that’s the more common instruction name, and tie in a little bit better with "most/least significant bit" than "left" or "right" do. The name most_significant_zeroes (and its variations for the other 3 operations) can also work, albeit it would be one of the biggest names in the C standard library if we do choose it. (This could potentially be shortened to most_signif_zeroes or even most_sig_zeroes).

#include <stdbit.h>

int stdc_leading_zeroesuc(unsigned char value);
int stdc_leading_zeroesus(unsigned short value);
int stdc_leading_zeroesui(unsigned int value);
int stdc_leading_zeroesul(unsigned long value);
int stdc_leading_zeroesull(unsigned long long value);

int stdc_leading_onesuc(unsigned char value);
int stdc_leading_onesus(unsigned short value);
int stdc_leading_onesui(unsigned int value);
int stdc_leading_onesul(unsigned long value);
int stdc_leading_onesull(unsigned long long value);

int stdc_trailing_zeroesuc(unsigned char value);
int stdc_trailing_zeroesus(unsigned short value);
int stdc_trailing_zeroesui(unsigned int value);
int stdc_trailing_zeroesul(unsigned long value);
int stdc_trailing_zeroesull(unsigned long long value);

int stdc_trailing_onesuc(unsigned char value);
int stdc_trailing_onesus(unsigned short value);
int stdc_trailing_onesui(unsigned int value);
int stdc_trailing_onesul(unsigned long value);
int stdc_trailing_onesull(unsigned long long value);

// type-generic macros
int stdc_leading_zeroes(generic_integer_type value);
int stdc_leading_ones(generic_integer_type value);
int stdc_trailing_zeroes(generic_integer_type value);
int stdc_trailing_ones(generic_integer_type value);

3.6.5. has_single_bit

This is a function that determines if an unsigned integer is a power of 2. It can be written either using a normal expression such as value != 0 && ((value & (value - 1)) == 0), or by using popcount(value) == 1. Checking that something is a power of 2 (or that it has a single bit set) is an operation used for checking if something can be turned into a mask value efficiently (useful in specific kinds of containers which specific bit limits like hash tables) and many other applications. This one does not map directly to a hardware instruction.

#include <stdbit.h>

_Bool stdc_has_single_bituc(unsigned char value);
_Bool stdc_has_single_bitus(unsigned short value);
_Bool stdc_has_single_bitui(unsigned int value);
_Bool stdc_has_single_bitul(unsigned long value);
_Bool stdc_has_single_bitull(unsigned long long value);

// type-generic macro
_Bool stdc_has_single_bit(generic_integer_type value);

3.6.6. bit_width/bit_ceil/bit_floor

These set of functions provide a way to determine the number of bits it takes to represent a given value (bit_width), the next largest power of 2 from the value (bit_ceil), the previous largest power of 2 from the value (bit_floor), and the number of bits required to store the given value. All of these operations are extremely useful, especially in the context of GPUs. bit_width can be used to drastically simplify the implementation of both bit_ceil and bit_floor.

bit_width can be calculated with VALUE_WIDTH - stdc_leading_zeroes(value), where VALUE_WIDTH is one of the <limits.h> macros for the given unsigned integer type. bit_ceil's computation is subtle and involves a bit of preparation to avoid problems with integer promotions and bit shifts in specific cases (typically unsigned char, char, and unsigned short). This aids in making the case for a would make for a good candidate for standardization (since it can be hard to get right). See the appendix for an implementation. bit_floor is simpler, and is comprised of a simple computation of x == 0 ? 0 : (1 << (stdc_bit_width(x) - 1)) (with appropriately typed / casted constants so the right type is returned without promotions or casts).

#include <stdbit.h>

unsigned char stdc_bit_flooruc(unsigned char value);
unsigned short stdc_bit_floorus(unsigned short value);
unsigned int stdc_bit_floorui(unsigned int value);
unsigned long stdc_bit_floorul(unsigned long value);
unsigned long long stdc_bit_floorull(unsigned long long value);

unsigned char stdc_bit_ceiluc(unsigned char value);
unsigned short stdc_bit_ceilus(unsigned short value);
unsigned int stdc_bit_ceilui(unsigned int value);
unsigned long stdc_bit_ceilul(unsigned long value);
unsigned long long stdc_bit_ceilull(unsigned long long value);

unsigned char stdc_bit_widthuc(unsigned char value);
unsigned short stdc_bit_widthus(unsigned short value);
unsigned int stdc_bit_widthui(unsigned int value);
unsigned long stdc_bit_widthul(unsigned long value);
unsigned long long stdc_bit_widthull(unsigned long long value);

// type-generic macro
generic_integer_type stdc_bit_floor(generic_integer_type value);
generic_integer_type stdc_bit_ceil(generic_integer_type value);
generic_integer_type stdc_bit_width(generic_integer_type value);

Notably, bit_width requires that the number is big enough to fit the representation. Conceivably, it might be beneficial to synchronize these return types and just return int. But, in the case of something like an implementation for _BitInt(N), N can be so catastrophically enormous that we could not count it in a (presumably 16 or 32-bit) int or unsigned int type. C++ always returns the type T that was put in, and we follow that here since any type is large enough to hold its own width in bits. However, in anticipation of a potentially enormous N in _BitWidth(N) — and not wanting to return an e.g. 4 GB _BitInt to represent a _BitWidth that has an N of 4 billion — we allow the return type for the generic functions to be a "suitably large unsigned integer type".

4. Committee Polls / Questions

For the Committee, this proposal is, effectively, five parts:

  1. the endianness definitions;

  2. the byteswap functions (generic and width-specific);

  3. the load/store, width-specific functions;

  4. the suite of low-level bit functions (mapping directly to instructions: leading|trailing_zeroes|ones, popcount, rotate_left|right); and,

  5. the suite of useful bit functions which may not map directly to instructions (bit_ceil, bit_floor, bit_width, has_single_bit).

These can be polled together or separately, depending on what the Committee desires. It is the author’s recommendation that all are adopted to make serialization and bit work with scalars much simpler and easier.

5. Wording

The following wording is relative to N2596.

5.1. Add <stdbit.h> to freestanding headers in §4, paragraph 6

A conforming freestanding implementation shall accept any strictly conforming program in which the use of the features specified in the library clause (Clause 7) is confined to the contents of the standard headers <float.h>, <iso646.h>, <limits.h>, <stdalign.h>, <stdarg.h>, <stdbool.h>, <stddef.h>, <stdint.h>, <stdbit.h>, and <stdnoreturn.h>

5.2. Add a new bullet point at the top for globally-reserved macro and library names to §7.1.3 "Reserved Identifiers, paragraph 1.

— All identifiers starting with stdc_ are reserved for future use.

5.3. Add a new §7.3�x sub-clause for "Bit and Byte Utilities" in §7

7.3�x Bit and Byte Utilities <stdbit.h>

The header <stdbit.h> defines the following macros and enumeration constants, as well as declares the following types and functions, to work with the byte and bit representation of many types, typically integer types. This header makes available the size_t type name (7.19) and any uintN_t or uint_leastN_t type names defined by the implementation (7.20).

5.3.1. Add a new §7.3�x.1 sub-sub-clause for "Endian" in §7.3�x

7.3�x.1 Endian

Two common methods of byte ordering in multi-byte scalar types are big-endian and little-endian. Big-endian is a format for storage of binary data in which the least significant byte is placed first, with the rest in ascending order. Little-endian is a format for storage or transmission of binary data in which the most significant byte is placed first, with the rest in descending order. Other byte orderings are also possible. Declarations and definitions in 7.3�x, a suffix containing le typically represents little endian. A suffix containing be typically represents big-endian. This clause describes the endianness of the execution environment with respect to standard, extended, and bit-precise integer types without padding bits.
It is unspecified whether any generic function declared in <stdbit.h> is a macro or an identifier declared with external linkage. If a macro definition is suppressed in order to access an actual function, or a program defines an external identifier with the name of a generic function, the behavior is undefined.
The macros are:
__STDC_ENDIAN_LITTLE__

which represents a method of byte order storage least significant byte is placed first, and the rest are in ascending order is suitable for use in an #if preprocessing directive;

__STDC_ENDIAN_BIG__

which represents a method of byte order storage most significant byte is placed first, and the rest are in descending order is suitable for use in an #if preprocessing directive;

__STDC_ENDIAN_NATIVE__ /* see below */

which represents the method of byte order storage for the execution environment and is suitable for use in an #if preprocessing directive.

__STDC_ENDIAN_NATIVE__ shall be identical to __STDC_ENDIAN_LITTLE__ if the execution environment is little-endian. Otherwise, __STDC_ENDIAN_NATIVE__ shall be identical to __STDC_ENDIAN_BIG__ if the execution environment is big-endian. If __STDC_ENDIAN_NATIVE__ is not equivalent to either, then the byte order for the execution environment is implementation-defined.

The enumeration type is
stdc_endian

which is both a typedef name and a tag name, whose corresponding enumeration constants are listed below.

The enumeration constants are:
stdc_endian_little = __STDC_ENDIAN_LITTLE__

which represents a method of byte order storage least significant byte is placed first, and the rest are in ascending order;

stdc_endian_big = __STDC_ENDIAN_BIG__

which represents a method of byte order storage most significant byte is placed first, and the rest are in descending order; and,

stdc_endian_native = __STDC_ENDIAN_NATIVE__

which represents the method of byte order storage for the execution environment.FOOTNOTE�0)

FOOTNOTE�0)Comparing the enumeration constants by stdc_endian_native == stdc_endian_little or stdc_endian_native == stdc_endian_big is the same as checking the macros for whether or not the execution environment is big-endian, little-endian, or neither.

5.3.2. Add a new §7.3�x.2 sub-sub-clause for "Memory Reordering" in §7.3�x

7.3�x.2 Memory Reversal

Synopsis

#include <stdbit.h>

void stdc_memreverse(size_t n, unsigned char ptr[static n]);

Description

The stdc_memreverse function provides an interface to reverse the order of a given sequence of bytes. ptr must be a pointer to an object of at least n bytes. If n is less than or equal to 1, then the function has no effect. Otherwise, let R represent the byte sequence represented by ptr and n in reverse order. Each byte’s value is exchanged with the value in its corresponding reverse position in R.

7.3�x.3 Exact-width Byte Swap

Synopsis

#include <stdbit.h>

uint_t stdc_byteswapuN(uint_t value);

Description

The stdc_byteswapuN functions provide an interface to swap the bytes of a corresponding uintN_t object, where N matches one of the exact-width integer types (7.20.1.1). If an implementation provides the corresponding uintN_t typedef, it shall define the corresponding byte swap function for that value of N.

Returns

Returns a byte swapped uintN_t value, as if by invoking stdc_memreverse(sizeof(value), (unsigned char*)&value).

5.3.3. Add a new §7.3�x.4 sub-sub-clause for "Endian Aware" functions in §7.3�x

7.3�x.4 Endian-Aware Load

Synopsis

#include <stdbit.h>

uint_leastN_t stdc_load_leuN(unsigned char const ptr[static sizeof(uint_leastN_t)]);
uint_leastN_t stdc_load_beuN(unsigned char const ptr[static sizeof(uint_leastN_t)]);

Description

The stdc_load_leuN and stdc_load_beuN functions return a uintN_t object by loading the bits from ptr in an endian-aware (7.3�x.1) manner, where N matches an existing minimum-width integer type (7.20.1.2). If an implementation provides the corresponding uint_leastN_t typedef, it shall define the corresponding endian-aware load functions.

Returns

For each 8-bit grouping in ptr up to N bits:
— let L be N / 8
— let index an integer in the range [0, L);
— let b be the 8-bit value of that grouping;
— and, let S be index * 8 for le functions and ((N - 8) - (index * 8)) for be functions.
Returns the summation of b << S for each index in the range [0, L). Any bits not written to in the summation are set to 0.

7.3�x.5 Endian-Aware Store

Synopsis

#include <stdbit.h>

void stdc_store_leuN(uint_leastN_t value,
                     unsigned char ptr[static sizeof(uint_leastN_t)]);
void stdc_store_beuN(uint_leastN_t value,
                     unsigned char ptr[static sizeof(uint_leastN_t)]);

Description

The stdc_store_leuN and stdc_store_beuN functions copy the bytes of an uintN_t object to ptr in an endian-aware (7.3�x) manner, where N matches an existing minimum-width integer type (7.20.1.2). The le stands for little-endian and the be stands for big-endian.

Returns

For each 8-bit grouping up to N bits in ptr and value:
— let L be N / 8
— let index an integer in the range [0, L);
— let pb be the 8-bit value within the grouping in ptr at a bit offset of index * 8;
— and, let vb be the 8-bit value within the grouping of value at a bit offset of index * 8 for le functions and ((N - 8) - (index * 8)) for be functions.
Performs pb = vb for each index in the range [0, L). Any bits not written to in ptr retain the value its original value before the function was executed.

5.3.4. Add a new §7.3�x.6 sub-sub-clause for Low-Level Bit Utilities in §7.3�x

7.3�x.6 Count Leading Zeroes

Synopsis

int stdc_leading_zeroesuc(unsigned char value);
int stdc_leading_zeroesus(unsigned short value);
int stdc_leading_zeroesui(unsigned int value);
int stdc_leading_zeroesul(unsigned long value);
int stdc_leading_zeroesull(unsigned long long value);

int stdc_leading_zeroes(generic_integer_type value);

Returns

Returns the number of consecutive 0 bits in value, starting from the most significant bit.
The type-generic function (marked by its generic_integer_type argument) returns the appropriate value based on the type of the input value, so long as it is an
— standard unsigned integer type;
— extended unsigned integer type;
— or, bit-precise unsigned integer type whose size matches a standard or extended integer type.

7.3�x.7 Count Leading Ones

Synopsis

int stdc_leading_onessuc(unsigned char value);
int stdc_leading_onesus(unsigned short value);
int stdc_leading_onesui(unsigned int value);
int stdc_leading_onesul(unsigned long value);
int stdc_leading_onesull(unsigned long long value);

int stdc_leading_ones(generic_integer_type value);

Returns

Returns the number of consecutive 1 bits in value, starting from the most significant bit.
The type-generic function (marked by its generic_integer_type argument) returns the appropriate value based on the type of the input value, so long as it is an
— standard unsigned integer type;
— extended unsigned integer type;
— or, bit-precise unsigned integer type whose size matches a standard or extended integer type.

7.3�x.8 Count Trailing Zeroes

Synopsis

int stdc_trailing_zeroesuc(unsigned char value);
int stdc_trailing_zeroesus(unsigned short value);
int stdc_trailing_zeroesui(unsigned int value);
int stdc_trailing_zeroesul(unsigned long value);
int stdc_trailing_zeroesull(unsigned long long value);

int stdc_trailing_zeroes(generic_integer_type value);

Returns

Returns the number of consecutive 0 bits in value, starting from the least significant bit.
The type-generic function (marked by its generic_integer_type argument) returns the appropriate value based on the type of the input value, so long as it is an
— standard unsigned integer type;
— extended unsigned integer type;
— or, bit-precise unsigned integer type whose size matches a standard or extended integer type.

7.3�x.9 Count Trailing Ones

Synopsis

int stdc_trailing_onessuc(unsigned char value);
int stdc_trailing_onesus(unsigned short value);
int stdc_trailing_onesui(unsigned int value);
int stdc_trailing_onesul(unsigned long value);
int stdc_trailing_onesull(unsigned long long value);

int stdc_trailing_ones(generic_integer_type value);

Returns

Returns the number of consecutive 1 bits in value, starting from the least significant bit.
The type-generic function (marked by its generic_integer_type argument) returns the appropriate value based on the type of the input value, so long as it is an
— standard unsigned integer type;
— extended unsigned integer type;
— or, bit-precise unsigned integer type whose size matches a standard or extended integer type.

7.3�x.10 Rotate Left

Synopsis

unsigned char stdc_rotate_leftuc(unsigned char value, int count);
unsigned short stdc_rotate_leftus(unsigned short value, int count);
unsigned int stdc_rotate_leftui(unsigned int value, int count);
unsigned long stdc_rotate_leftul(unsigned long value, int count);
unsigned long long stdc_rotate_leftull(unsigned long long value, int count);

generic_integer_type stdc_rotate_left(generic_integer_type value, int count);

Description

The stdc_rotate_left functions perform a bitwise rotate left. This operation is typically known as a left circular shift.

Returns

Let N be the width corresponding to the type of the input value. Let r be count % N.
— If r is 0, returns value;
— otherwise, returns (value << r) | (value >> (N - r)).
The type-generic function (marked by its generic_integer_type argument) returns the above described result for a given input value so long as the generic_integer_type is an
— standard unsigned integer type;
— extended unsigned integer type;
— or, bit-precise unsigned integer type whose size matches a standard or extended integer type.

7.3�x.11 Rotate Right

Synopsis

unsigned char stdc_rotate_rightuc(unsigned char value, int count);
unsigned short stdc_rotate_rightus(unsigned short value, int count);
unsigned int stdc_rotate_rightui(unsigned int value, int count);
unsigned long stdc_rotate_rightul(unsigned long value, int count);
unsigned long long stdc_rotate_rightull(unsigned long long value, int count);

generic_integer_type stdc_rotate_right(generic_integer_type value, int count);

Description

The stdc_rotate_right functions perform a bitwise rotate right. This operation is typically known as a right circular shift.

Returns

Let N be the width corresponding to the type of the input value.. Let r be count % N.
— If r is 0, returns value;
— otherwise, if r is positive, returns (value >> r) | (value << (N - r));
The type-generic function (marked by its generic_integer_type argument) returns the above described result for a given input value so long as the generic_integer_type is an
— standard unsigned integer type;
— extended unsigned integer type;
— or, bit-precise unsigned integer type whose size matches a standard or extended integer type.

7.3�x.12 Population Count

Synopsis

int stdc_popcountuc(unsigned char value);
int stdc_popcountus(unsigned short value);
int stdc_popcountui(unsigned int value);
int stdc_popcountul(unsigned long value);
int stdc_popcountull(unsigned long long value);

int stdc_popcount(generic_integer_type value);

Returns

The stdc_popcount functions returns the total number of 1 bits within the given value.
The type-generic function (marked by its generic_integer_type argument) returns the previously described result for a given input value so long as the generic_integer_type is an
— standard unsigned integer type;
— extended unsigned integer type;
— or, bit-precise unsigned integer type whose size matches a standard or extended integer type.

5.3.5. Add a new §7.3�x.3 sub-sub-clause for Fundamental Bit Utilities in §7.3�x

7.3�x.13 Single-bit Check

Synopsis

_Bool stdc_has_single_bituc(unsigned char value);
_Bool stdc_has_single_bitus(unsigned short value);
_Bool stdc_has_single_bitui(unsigned int value);
_Bool stdc_has_single_bitul(unsigned long value);
_Bool stdc_has_single_bitull(unsigned long long value);

_Bool stdc_has_single_bit(generic_integer_type value);

Returns

The stdc_has_single_bit functions returns true if and only if there is a single 1 bit in value.
The type-generic function (marked by its generic_integer_type argument) returns the previously described result for a given input value so long as the generic_integer_type is an
— standard unsigned integer type;
— extended unsigned integer type;
— or, bit-precise unsigned integer type whose size matches a standard or extended integer type.

7.3�x.14 Bit Width

Synopsis

size_t stdc_bit_widthuc(unsigned char value);
size_t stdc_bit_widthus(unsigned short value);
size_t stdc_bit_widthui(unsigned int value);
size_t stdc_bit_widthul(unsigned long value);
size_t stdc_bit_widthull(unsigned long long value);

size_t stdc_bit_width(generic_integer_type value);

Description

The stdc_bit_width functions compute the smallest number of bits needed to store value.

Returns

The stdc_bit_width functions return 0 if value is 0. Otherwise, they return 1 +log2(value).
The type-generic function (marked by its generic_integer_type argument) returns the previously described result for a given input value so long as the generic_integer_type is an
— standard unsigned integer type;
— extended unsigned integer type;
— or, bit-precise unsigned integer type whose size matches a standard or extended integer type.

7.3�x.15 Bit Floor

Synopsis

unsigned char stdc_bit_flooruc(unsigned char value);
unsigned short stdc_bit_floorus(unsigned short value);
unsigned int stdc_bit_floorui(unsigned int value);
unsigned long stdc_bit_floorul(unsigned long value);
unsigned long long stdc_bit_floorull(unsigned long long value);

generic_integer_type stdc_bit_floor(generic_integer_type value);

Description

The stdc_bit_floor functions compute the largest integral power of 2 that is not greater than value.

Returns

The stdc_bit_floor functions return 0 if value is 0. Otherwise, they return the largest integral power of 2 that is not greater than value.
The type-generic function (marked by its generic_integer_type argument) returns the previously described result for a given input value so long as the generic_integer_type is an
— standard unsigned integer type;
— extended unsigned integer type;
— or, bit-precise unsigned integer type whose size matches a standard or extended integer type.

7.3�x.16 Bit Ceiling

Synopsis

unsigned char stdc_bit_ceiluc(unsigned char value);
unsigned short stdc_bit_ceilus(unsigned short value);
unsigned int stdc_bit_ceilui(unsigned int value);
unsigned long stdc_bit_ceilul(unsigned long value);
unsigned long long stdc_bit_ceilull(unsigned long long value);

generic_integer_type stdc_bit_ceil(generic_integer_type value);

Description

The stdc_bit_ceil functions compute the smallest integral power of 2 that is not less than value. If the computation does not fit in the given return type, the behavior is undefined.

Returns

The stdc_bit_ceil functions return the smallest integral power of 2 that is not less than value.
The type-generic function (marked by its generic_integer_type argument) returns the previously described result for a given input value so long as the generic_integer_type is an
— standard unsigned integer type;
— extended unsigned integer type;
— or, bit-precise unsigned integer type whose size matches a standard or extended integer type.

5.4. Add one new entry for Implementation-defined behavior in Annex J.3

— The value of stdc_endian_native and __STDC_ENDIAN_NATIVE__ if the execution environment is not big-endian or little-endian (7.3�x.1).

5.5. Modify an existing entry for Unspecified behavior in Annex J.1

— The macro definition of a generic function is suppressed in order to access an actual function (7.17.1) , (7.3�x).

6. Appendix

A collection of miscellaneous and helpful bits of information and implementation.

6.1. Implementation of Generic popcount

Sample implementation on Godbolt (clang/gcc specific builtins):

#define stdc_popcount(...) \
	_Generic((__VA_ARGS__), \
		char: __builtin_popcount, \
		unsigned char: __builtin_popcount, \
		unsigned short: __builtin_popcount, \
		unsigned int: __builtin_popcount, \
		unsigned long: __builtin_popcountl, \
		unsigned long long: __builtin_popcountll \
	)(__VA_ARGS__)

int main () {
	return stdc_popcount((unsigned char)'0') + stdc_popcount(13ull);
}

6.2. Implementation of Generic bit_ceil

Sample implementation on Godbolt (clang/gcc specific builtins):

#include <limits.h>

#define stdc_leading_zeroes(...) \
	(_Generic((__VA_ARGS__), \
		char: __builtin_clz((__VA_ARGS__)) - ((sizeof(unsigned) - sizeof(char)) * CHAR_BIT), \
		unsigned char: __builtin_clz((__VA_ARGS__)) - ((sizeof(unsigned) - sizeof(unsigned char)) * CHAR_BIT), \
		unsigned short: __builtin_clz((__VA_ARGS__)) - ((sizeof(unsigned) - sizeof(unsigned short)) * CHAR_BIT), \
		unsigned int: __builtin_clz((__VA_ARGS__)), \
		unsigned long: __builtin_clzl((__VA_ARGS__)), \
		unsigned long long: __builtin_clzll((__VA_ARGS__)) \
	))

#define stdc_bit_width(...) \
	_Generic((__VA_ARGS__), \
		char: (CHAR_BIT - stdc_leading_zeroes((__VA_ARGS__))), \
		unsigned char: (UCHAR_WIDTH - stdc_leading_zeroes((__VA_ARGS__))), \
		unsigned short: (USHRT_WIDTH - stdc_leading_zeroes((__VA_ARGS__))), \
		unsigned int: (UINT_WIDTH - stdc_leading_zeroes((__VA_ARGS__))), \
		unsigned long: (ULONG_WIDTH - stdc_leading_zeroes((__VA_ARGS__))), \
		unsigned long long: (ULLONG_WIDTH - stdc_leading_zeroes((__VA_ARGS__))) \
	)

// integer promotion rules means we need to
// precisely calculate the value here ~_~
#define __stdc_bit_ceil_promotion_protection(_Type, _Value) \
	_Generic((_Value), \
		char: (_Value <= (_Type)1) ? (_Type)0 : (_Type)(1u <fake-production-placeholder class=production bs-autolink-syntax='<< (stdc_bit_width((_Type)(_Value - 1)) + (UINT_WIDTH - UCHAR_WIDTH)) >>'> (stdc_bit_width((_Type)(_Value - 1)) + (UINT_WIDTH - UCHAR_WIDTH)) </fake-production-placeholder> (UINT_WIDTH - UCHAR_WIDTH)), \
		unsigned char: (_Value <= (_Type)1) ? (_Type)0 : (_Type)(1u <fake-production-placeholder class=production bs-autolink-syntax='<< (stdc_bit_width((_Type)(_Value - 1)) + (UINT_WIDTH - UCHAR_WIDTH)) >>'> (stdc_bit_width((_Type)(_Value - 1)) + (UINT_WIDTH - UCHAR_WIDTH)) </fake-production-placeholder> (UINT_WIDTH - UCHAR_WIDTH)), \
		unsigned short: (_Value <= (_Type)1) ? (_Type)0 : (_Type)(1u <fake-production-placeholder class=production bs-autolink-syntax='<< (stdc_bit_width((_Type)(_Value - 1)) + (UINT_WIDTH - USHRT_WIDTH)) >>'> (stdc_bit_width((_Type)(_Value - 1)) + (UINT_WIDTH - USHRT_WIDTH)) </fake-production-placeholder> (UINT_WIDTH - USHRT_WIDTH)), \
		default: (_Type)0 \
	)

#define stdc_bit_ceil(...) \
	_Generic((__VA_ARGS__), \
		char: __stdc_bit_ceil_promotion_protection(unsigned char, (__VA_ARGS__)), \
		unsigned char: __stdc_bit_ceil_promotion_protection(unsigned char, (__VA_ARGS__)), \
		unsigned short: __stdc_bit_ceil_promotion_protection(unsigned short, (__VA_ARGS__)), \
		unsigned int: (unsigned int)(1u << stdc_bit_width((unsigned int)((__VA_ARGS__) - 1))), \
		unsigned long: (unsigned long)(1ul << stdc_bit_width((unsigned long)((__VA_ARGS__) - 1))), \
		unsigned long long: (unsigned long long)(1ull << stdc_bit_width((unsigned long long)((__VA_ARGS__) - 1))) \
	)

int main () {
	int x = stdc_bit_ceil((unsigned char)'\x13');
	int y = stdc_bit_ceil(33u);
	return x + y;
}

References

Informative References

[ARM-SETEND]
armKEIL. SETEND instruction: ARM and Thumb instructions. December 31st, 2019. URL: https://www.keil.com/support/man/docs/armasm/armasm_dom1361289895072.htm
[CLANG-BUILTINS]
LLVM Foundation; Clang Contributors. Clang Language Extensions: Clang Documentation. September 1st, 2021. URL: https://clang.llvm.org/docs/LanguageExtensions.html#intrinsics-support-within-constant-expressions
[ENDIAN-FALLACY]
Rob Pike. The Byte Order Fallacy. April 3rd, 2012. URL: https://commandcenter.blogspot.com/2012/04/byte-order-fallacy.html
[GCC-BUILTINS]
GCC Contributors. Other Built-in Functions Provided by GCC. September 1st, 2021. URL: https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html
[LIBCORK-BYTE-ORDER]
Douglas Creager. libcork: Byte order. November 22nd, 2017. URL: https://libcork.io/0.15.0/byte-order.html
[LINUX-ENDIAN]
Linux; BSD. endian(3). September 1st, 2021. URL: https://linux.die.net/man/3/endian
[MSVC-BUILTINS]
Microsoft. _byteswap_uint64, _byteswap_ulong, _byteswap_ushort. November 4th, 2016. URL: https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/byteswap-uint64-byteswap-ulong-byteswap-ushort?view=msvc-160
[N2596]
ISO/IEC JTC1 SC22 WG14 - Programming Languages, C; JeanHeyd Meneide; Freek Wiedijk. N2596: ISO/IEC 9899:202x - Programming Languages, C. December 11th, 2020. URL: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2596.pdf
[NTOHL]
Linux. ntohl(3). September 30th, 2021. URL: https://linux.die.net/man/3/ntohl
[P0463]
Howard E. Hinnant. endian. Just endian.. July 13th, 2017. URL: https://wg21.link/p0463
[P0553]
Jens Maurer. Bit operations. March 1st, 2019. URL: https://wg21.link/p0553
[PORTABLE-ENDIANNESS]
David Seifert. portable-endianness. May 16th, 2021. URL: https://github.com/SoapGentoo/portable-endianness
[SDCC]
Dr. Philipp K. Krause. SDCC Manual §8.1.9 - Bit Rotations. September 25th, 2021. URL: http://sdcc.sourceforge.net/doc/sdccman.pdf
[TI-TMS320C64X]
Texas Instruments. TMS320C64x/C64x+ DSP: CPU and Instruction Set. July 31st, 2010. URL: https://www.ti.com/lit/ug/spru732j/spru732j.pdf