N 2281: Make mblen, mbtowc, and wctomb thread-safer

Submitter:Philipp Klaus Krause
Submission Date:2018-06-14

Summary:

Make mblen, mbtowc, and wctomb thread-safer.

This is an updated version of N2246 that takes into account comments from the discussion immediately following the Brno meeting.

Justification:

At London, the committee wanted to resolve CR 498 by stating that mblen, mbtowc, and wctomb are not thread safe, even when the encoding is not state-dependent.

However, there would be advantages to making them thread-safe for encodings that are not state-dependent. Such a change should be considered for the future C standard.

For encodings that are not state-dependent, mbtowc, and wctomb can easily be implemented without using any internal state. mblen() can easily be implemented without internal state even for state-dependent encodings. Making this a requirement would allow multithreaded applications to use these functions (as long as it is known that the encoding is not state-dependent, which is true for nearly all encodings and can be queried using existing functionality).

Currently, multithreaded application have to use synchronization or use the restartable mbrlen(), etc instead. Neither is a good option where speed or code size matters.

Synchronization obviously has quite some overhead, and unnecessary synchronization should be avoided for multithreaded programs.

The restartable functions are slow and big (being restartable they need to be able to handle incomplete input). This can be seen in CR 498, and was stated by multiple attendants of the London meeting.

Proposed changes:

§7.22.7 from (text from the current proposed technical corrigendum for CR 498)

The behavior of the multibyte character functions is affected by the LC_CTYPE category of the current locale. For a state-dependent encoding, each function is placed into its initial conversion state at program startup and can be returned to that state by a call for which its character pointer argument, s, is a null pointer. Subsequent calls with s as other than a null pointer cause the internal conversion state of the function to be altered as necessary. A call with s as a null pointer causes these functions to return a nonzero value if encodings have state dependency, and zero otherwise.305) Changing the LC_CTYPE category causes the conversion state of these functions to be indeterminate. A call to any one of these functions may introduce a data race with a call to any other function in this subclause.

to

The behavior of the multibyte character functions is affected by the LC_CTYPE category of the current locale. For a state-dependent encoding, the mbtowc and wctomb functions are placed into its initial conversion state at program startup and can be returned to that state by a call for which its character pointer argument, s, is a null pointer. Subsequent calls with s as other than a null pointer cause the internal conversion state of the function to be altered as necessary. A call with s as a null pointer causes these functions to return a nonzero value if encodings have state dependency, and zero otherwise.305) Changing the LC_CTYPE category causes the conversion state of the mbtowc and wctomb functions to be indeterminate. For state-dependent encodings only, the mbtowc and wctomb functions are not required to avoid data races with other calls to the same function.