Document Number: N2225
Submitter: Florian Weimer
Submission Date: 2018-03-26
Subject: Multi-threading behavior of strtok, getenv, set_constraint_handler

Summary

This document deals with various aspects of data races related to shared global state. It is related to papers N2226, N2227, N2228, but the changes discussed here either lack implementation precedent (getenv, set_constraint_handler), or affect a function widely considered obsolescent (strtok).

  1. In 7.24.5.8 (The strtok function) it is mentioned that:

    The strtok function is not required to avoid data races with other calls to the strtok function.

    An implementation which would like to support applications which call strtok in the presence of threads will face substantial challenges in doing so. The reason is that the standard currently requires that the strtok internal state is shared among all threads, and an application can observe this by calling strtok from several threads with proper synchronization. (The data races cannot be avoided by using atomic operations inside strtok alone because the buffer is written to within strtok and read outside of it, and the read access is not atomic.)

    The Solaris implementation of strtok appears to be non-conforming because it is implemented with thread-local state:

    The strtok() function is safe to use in multithreaded applications because it saves its internal state in a thread-specific data area. [Source]

    POSIX suggests the possibility of a thread-safe implementation of the function, but does not discuss that strictly conforming programs are able to detect that the hidden internal state has thread storage duration, by calling the strtok function from separate threads with suitable external synchronization.

    Rather than coming up with language that covers all existing implementation behavior, we propose to make the strtok function undefined in multi-threaded programs. This is the approach already used for the signal function.

  2. For the getenv function in 7.22.4.6, it is not entirely clear how an implementation would eliminate the data race. One approach would be to make sure that the returned pointer remains valid until the current thread exits, so the proposal suggests to support this approach.

    POSIX suggests that a thread-safe implementation of the getenv function is possible, but does not say how, and suggests that in the future, separate interfaces might be provided for that.

  3. In the case of the set_constraint_handler function defined in Annex K, implementations should be able to make the current handler state thread-local if they desire to do so, as long as the handler state is inherited from the current thread at the time of creation.

    If Annex K is deprecated (see N1969), then this proposal can be dropped, or a solution similar to strtok (undefined behavior in multi-threaded programs) could be adopted.

Proposed Resolution

  1. In 7.24.5.8 (The strtok function), change:

    The strtok function is not required to avoid data races with other calls to the strtok function. Use of this function in a multi-threaded program results in undefined behavior. 311)

    311) The strtok_s function can be used instead to avoid data races.

    In J.2 (Undefined behavior), add:
    — The strtok function is used in a multi-threaded program (7.24.5.8).
  2. In 7.22.4.6 (The getenv function), add:
    The string pointed to shall not be modified by the program, but may be overwritten by a subsequent call to the getenv function. If the returned pointer is accessed after the thread which has called the getenv function has exited, the behavior is undefined.
    In J.2 (Undefined behavior), add:
    — Access to the pointer returned by the getenv function after the thread that originally called the function has exited (7.22.4.6).
  3. In K.3.6.1.1 (The set_constraint_handler_s function), add:
    Only the most recent handler registered with set_constraint_handler_s is called when a runtime-constraint violation occurs. It is implementation-defined whether the registered constraint handler has thread storage duration. The registered constraint handler for a newly created thread shall be the same as the registered constraint handler of the current thread at the time of creation.
    In J.3.12 (Library functions), add:
    — Whether the registered constraint handler set by the set_constraint_handler_s has thread storage duration (K.3.6.1.1).