N2349 - Toward more efficient string copying and concatenation

Document number: N2349
Submitter: Martin Sebor
Submission Date: March 18, 2019
Subject: Toward more efficient string copying and concatenation

Summary

Among the most heavily used string handling functions declared in the standard C <string.h> header are those that copy and concatenate strings. Both sets of functions copy characters from one object to another, and both return their first argument: a pointer to the beginning of the destination object. The choice of the return value is a source of inefficiency that this proposal seeks to address.

Note that the code examples shown in this paper are for illustration only. They should not be viewed as recommended practice and may contain subtle bugs.

Standard Solution

The design of returning the functions' first argument is sometimes questioned by users wondering about its purpose – see for example strcpy() return value, or C: Why does strcpy return its argument? The simple answer is that it's due to an historical accident. The first subset of the functions was introduced in the Seventh Edition of UNIX in 1979 and consisted of strcat, strncat, strcpy and strncpy. Even though all four functions were used in the implementation of UNIX, some extensively, none of their calls made use of their return value. The functions could have just as easily, and as it turns out, far more usefully, been defined to return a pointer to the last copied character, or just past it.

The optimal complexity of concatenating two or more strings is linear in the number of characters. But, as stated above, having the functions return the destination pointer leads to the operation being significantly less than optimally efficient. The functions traverse the source and destination sequences and obtain the pointers to the end of both. The pointers point either at or just past the terminating nul character that the functions (with the exception of strncpy) append to the destination. However, by returning a pointer to the first character rather than the last (or one just past it), the position of the NUL character is lost and must be computed again when it's needed. This inefficiency can be illustrated on an example concatenating two strings, s1 and s2, into the destination buffer d. The idiomatic (though far from ideal) way to append two strings is by calling the strcpy and strcat functions as follows.

	strcat (strcpy (d, s1), s2);

To perform the concatenation, one pass over s1 and one pass over s2 is all that is necessary in addition to the corresponding pass over d that happens at the same time, but the call above makes two passes over s1. Let's break up the calls into two statements.

	char *d1 = strcpy (d, s1);   // pass 1 over s1
	strcat (d1, s2);             // pass 2 over the copy of s1 in d

Because strcpy returns the value of its first argument, d, the value of d1 is the same as d. For simplicity, the examples that follow use d instead of storing the return value in d1 and using it. In the strcat call, determining the position of the last character involves traversing the characters just copied to d1. The cost of doing this is linear in the length of the first string, s1. The cost is multiplied with each appended string, and so tends toward quadratic in the number of concatenations times the lengths of all the concatenated strings. This inefficiency is so infamous to have earned itself a name: Schlemiel the Painter's algorithm. See also [1].

It's important to point out that in addition to being inefficient, strcat and strcpy are notorious for their propensity for buffer overflow because neither provides a bound on the number of copied characters.

Attempts To Overcome Limitations

When the lengths of the strings are unknown and the destination size is fixed, following some popular secure coding guidelines to constrain the result of the concatenation to the destination size would actually lead to two redundant passes. For example, following the CERT advisory on the safe uses of strncpy() and strncat() and with the size of the destination being dsize bytes, we might end up with the following code.

	strncpy (d, s1, dsize - 1);      // pass 1 over s1 plus over d up to dsize - 1
	d[dsize - 1] = '\0';             // remember to nul-terminate
	size_t n = strlen (d);           // pass 2 over copy of s1 in d
	strncat (d, s2, dsize - n - 1);  // pass 3 over copy of s1 in d

Note that unlike the call to strncat, the call to strncpy above does not append the terminating NUL character to d when s1 is longer than d's size. It's a common mistake to assume it does. In addition, when s1 is shorter than dsize - 1 the strncpy funcion sets all the remaining characters to NUL which is also considered wasteful since the subsequent call to strncat will end up overwriting them.

In a futile effort to avoid some of the redundancy, programmers sometimes opt to first compute the string lengths and then use memcpy as shown below. This approach, while still less than optimally efficient, is even more error-prone and difficult to read and maintain.

	size_t s1len = strlen (s1);      // pass 1 over s1
	if (dsize <= s1len)
          s1len = dsize - 1;             // no need to nul-terminate
	memcpy (d, s1, s1len);           // pass 2 over s1
	size_t s2len = strlen (s2);      // pass 1 over s2
	if (dsize - s1len <= s2len)
          s2len = dsize - s1len - 1;
	memcpy (d + s1len, s2, s2len);   // pass 2, over s2
	d[s1len + s1len] = '\0';         // nul-terminate result

Using `sprintf` And `snprintf` For Concatenation

Programmers concerned about the complexity and readability of their code sometimes use the snprintf function instead.

	snprintf (d, dsize, "%s%s", s1, s2);

This results in code that is eminently readable but, owing to snprintf's considerable overhead, can be orders of magnitude slower than using the string functions despite their inefficiencies. The overhead is due not only to parsing the format string but also to complexities typically inherent in implementations of formatted I/O functions.

Some compilers such as GCC and Clang attempt to avoid the overhead of some calls to I/O functions by transforming very simple sprintf and snprintf calls to those to strcpy or memcpy for efficiency. See a live example online. However, the corresponding transformation is rarely performed for snprintf because there is no equivalent string function in the C library (the transformation is only done when the snprintf call can be proven not to result in the truncation of output). memcpy alone is not suitable because it copies exactly as many bytes as specified, and neither is strncpy because it overwrites the destination even past the end of the final NUL character. The overhead of transforming snprintf calls to a sequence of strlen and memcpy calls is not viewed as sufficiently profitable due to the redundant pass over the string. The section titled Better builtin string functions lists some of the limitations of the GCC optimizer in this area as well as some of the tradeoffs involved in improving it.

POSIX `stpcpy` and `stpncpy`

A number of library solutions that are outside the C standard have emerged over the years to help deal with this problem. The POSIX standard includes the stpcpy and stpncpy functions that return a pointer to the NUL character if it is found and can be used to mitigate the inconvenience and inefficiency discussed above.

	const char* stpcpy (char* restrict, const char* restrict);
	const char* stpncpy (char* restrict, const char* restrict, size_t);

In particular, where buffer overflow is not a concern, stpcpy can be called like so to concatenate strings:

	stpcpy (stpcpy (d, s1), s2);

However, using stpncpy equivalently when the copy must be bounded by the size of the destination does not eliminate the overhead of zeroing out the rest of the destination after the first nul character and up to the maximum of characters specified by the bound.

	char *ret = stpncpy (d, dsize, s1);   // zeroes out d beyond the end of s1
	dsize -= (ret - d);
	stpncpy (d, dsize, s2);               // again zeroes out d beyond the end

As a result, the function is still inefficient because each call to it zeroes out the space remaining in the destination and past the end of the copied string. Thus, the complexity of this operation is still quadratic. The severity of the inefficiency increases in proportion to the size of the destination and in inverse relation to the lengths of the concatenated strings.

OpenBSD `strlcpy` and `strlcat`

In response to buffer overflow attacks exploiting the weaknesses of strcpy and strcat functions, and some of the shortcomings of strncpy and strncat discussed above, the OpenBSD project in the late 1990's introduced a pair of alternate APIs designed to make string copying and concatentation safer. See [2].

	size_t strlcpy (char* restrict, const char* restrict, size_t);
	size_t strlcat (char* restrict, const char* restrict, size_t);

The main difference between strncpy and strlcpy is in the return value: while the former returns a pointer to the destimation the latter returns the number of characters copied. Another difference is that strlcpy always stores exactly one nul in the destination. To concatenate s1 and s2 the strlcpy function might be used as follows.

	size_t n = strlcpy (d, s1, dsize);
	dsize -= n;
	d += n;
	strlcpy (d, s2, dsize);

This makes strlcpy comparable to snprintf both in its usage and in complexity (of course, the snprintf overhead, while constant, is much greater).

The strlcpy and strlcat functions are available on other systems besides OpenBSD, including Solaris and Linux (in the BSD compatibility library) but because they are not specified by POSIX they are not nearly ubiquitous.

POSIX `memccpy`

POSIX also defines another function that has all the desirable properies discussed above and that can be used to solve the problem.

	void* memccpy (void* restrict dst, const void* restrict src, int c, size_t n);

The function combines the properties of memcpy, memchr, and the best aspects of the APIs discussed above.

Like memchr, it scans the source sequence for the first occurrence of a character specified by one of its arguments. The character can have any value, including zero.
Like strlcpy, it copies (at most) the specified number of characters from the source sequence to the destination, without writing byond it. This resolves the inefficiency complaint about strncpy and stpncpy.
Similarly to (though not exactly as) stpcpy and stpncpy, it returns a pointer just past the copy of the specified character if it exists. (Recall that stpcpy and stpncpy return a pointer to the copied nul.) This avoids the inefficiency inherent in strcpy and strncpy.

Thus, the first example above (strcat (strcpy (d, s1), s2)) can be rewritten using memccpy to avoid any redundant passes over the strings as follows. Note that by using SIZE_MAX as the bound this rewrite doesn't avoid the risk of overflowing the destination present in the original example and should be avoided.

	memccpy (memccpy (d, s1, '\0', SIZE_MAX) - 1, s2, '\0', SIZE_MAX);

To avoid the risk of buffer overflow, the appropriate bound needs to be determined for each call and provided as an argument. So a concatenation constrained to the size of the destination as in the snprintf (d, dsize, "%s%s", s1, s2) call might compute the destination size as follows.

	char *p = memccpy (d, s1, '\0', dsize);
	dsize -= (p - d - 1);
	memccpy (p - 1, s2, '\0', dsize);

Choosing a Solution

The efficiency problems discussed above could be solved if instead of returning the value of their first argument the string functions returned a pointer either to or just past the last stored character. However, changing the existing functions after they have been in use for nearly half a century is not feasible.

Although it is not be feasible to solve the problem for the existing C standad string functions it is possible to mitigate it in new code by adding one or more functions that do not suffer from the same limitations. Since C's charter is codifying existing practice it is incumbent on the standardization committee to investigate whether such a function already exists in popular implementations and, if so, consider adopting it. As has been shown above, several such solutions exist.

Of the solutions described above, the memccpy function is the most general, optimially efficient, backed by an ISO standard, the most widely available even beyond POSIX implementations, and the least controversial.

In contrast, the stpcpy and stpncpy functions are less general and stpncpy suffers from unnecessary overhead, and so do not meet the outlined goals. The functions might still be worth considering for adoption in C2X to improve portabilty. See N2352 - Add stpcpy and stpncpy to C2X for a proposal.

The OpenBSD strlcpy and strlcat functions, while optimal, are less general, far less widely supported, and not specified by a standard.

The memccpy function exists not just in a subset of UNIX implemenetations, it is specified by another ISO standard, namely ISO/IEC 9945, also known as IEEE Std 1003.1, 2017 Edition, or for short, POSIX: memccpy, where it is provided as an XSI extension to C. The function was derived from System V Interface Definition, Issue 1 (SVID 1), originally published in 1985.

memccpy is available even beyond implementations of UNIX and POSIX, including for example:

memccpy in Android,
memccpy in Apple Mac OS X,
memccpy in the BlackBerry Native SDK,
memccpy in the Compaq Run-Time Library for VAX,
memccpy in the Microsoft Visual Studio C Runtime Library,
memccpy in IBM z/OS.

A trivial (but inefficient) reference implementation of memccpy is provided below.

      void* memccpy (void* restrict dst, const void* restrict src, int c, size_t n)
      {
        void *pc = memchr (src, c, n);
        void *ret;

        if (pc)
        {
          n = (char*)pc - (char*)src + 1;
          ret = (char*)dst + n;
        }
        else
          ret = 0;

        memcpy (dst, src, n);
        return ret;
      }

A more optimal implementation of the function might be as follows.

      void* memccpy (void* restrict dst, const void* restrict src, int c, size_t n)
      {
        const char *s = src;
        for (char *ret = dst; n; ++ret, ++s, --n)
        {
          *ret = *s;
          if ((unsigned char)*ret == (unsigned char)c)
            return ret + 1;
        }
        return 0;
      }

By relying on memccpy optimizing compilers will be able to transform simple snprintf (d, dsize, "%s", s) calls into the optimally efficient calls to memccpy (d, s, '\0', dsize). Trading code size for speed, aggressive optimizers might even transform snprintf calls with format strings consisting of multiple %s directives interspersed with ordinary characters such as "%s/%s" into series of such memccpy calls as shown below:

      char *p = memccpy (d, s1, '\0', dsize);
      if (p)
      {
        --p;
        p = memccpy (p, "/", '\0', dsize - (p - d));
        if (p)
        {
          --p;
          p = memccpy (p, s2, '\0', dsize - (p - d));
        }
      }
      if (!p)
        d[dsize - 1] = '\0';

References

[1] The Sad State of C Strings by Howard Chu, 2016.
[2] strlcpy and strlcat – consistent, safe, string copy and concatenation by Todd C. Miller, OpenBSD project.

Suggested Change

To overcome the limitations discussed above and provide a general purpose solution for the safe and efficient concatenation of strings we propose to add the POSIX memccpy function to the next revision of the C standard.

Specifically, add to §7.24.2 Copying functions the following subsection. The proposed text has been adopted verbatim from the POSIX specification, except that the term byte has been replaced with the term character according to the C convention (note N1736 (DR446) and N2300 both propose to change character to byte in this section).

7.24.2.? The `memccpy` function

Synopsis

	#include <string.h>

	void *memccpy(void * restrict s1,
 	              const void * restrict s2,
	              int c,
	              size_t n);

Description

The memccpy function copies characters from the object pointed to by s2 into the object pointed to by s1, stopping after the first occurrence of character c (converted to an unsigned char) is copied, or after n characters are copied, whichever comes first. If copying takes place between objects that overlap, the behavior is undefined.

Returns

The memccpy function returns a pointer to the character after the copy of c in s1, or a null pointer if c was not found in the first n characters of s2.