This page is a snapshot from the LWG issues list, see the Library Active Issues List for more information and the meaning of SG16 status.
Section: 30.3.1.2.1 [locale.category], 30.4.2.5.1 [locale.codecvt.general] Status: SG16 Submitter: Victor Zverovich Opened: 2022-09-05 Last modified: 2022-09-23
Priority: 3
View all other issues in [locale.category].
View all issues with SG16 status.
Discussion:
Table [tab:locale.category.facets] includes the following two facets:
codecvt<char16_t, char8_t, mbstate_t>
codecvt<char32_t, char8_t, mbstate_t>
However, neither of those actually has anything to do with a locale and therefore it doesn't make sense to dynamically register them with std::locale. Instead they provide conversions between fixed encodings (UTF-8, UTF-16, UTF-32) that are unrelated to locale encodings other than they may happen to coincide with encodings of some locales by accident.
The issue was introduced when adding codecvt<char[16|32]_t, char, mbstate_t> in N2035 which gave no design rationale for using codecvt in the first place. Likely it was trying to do a minimal amount of changes and copied the wording for codecvt<wchar_t, char, mbstate_t> but unfortunately didn't consider encoding implications. P0482 changed char to char8_t in these facets which made the issue more glaring but unfortunately, despite the breaking change, it failed to address it. Apart from an obvious design mistake this also adds a small overhead for every locale construction because the implementation has to copy these pseudo-facets for no good reason violating "don't pay for what you don't use" principle. A simple fix is to remove the two facets from table [tab:locale.category.facets] and make them directly constructible.[2022-09-23; Reflector poll]
Set priority to 3 after reflector poll. Send to SG16 (then maybe LEWG).
Proposed resolution:
This wording is relative to N4917.
Modify 30.3.1.2.1 [locale.category], Table 105 ([tab:locale.category.facets]) — "Locale category facets" — as indicated:
Table 105: Locale category facets [tab:locale.category.facets] Category Includes facets … ctype ctype<char>, ctype<wchar_t>
codecvt<char, char, mbstate_t>
codecvt<char16_t, char8_t, mbstate_t>
codecvt<char32_t, char8_t, mbstate_t>
codecvt<wchar_t, char, mbstate_t>…
Modify 30.4.2.5.1 [locale.codecvt.general] as indicated:
namespace std { […] template<class internT, class externT, class stateT> class codecvt : public locale::facet, public codecvt_base { public: using intern_type = internT; using extern_type = externT; using state_type = stateT; explicit codecvt(size_t refs = 0); ~codecvt(); […] protected:~codecvt();[…] }; }[…]
-3- The specializations required in Table105 [tab:locale.category.facets]106 [tab:locale.spec] (30.3.1.2.1 [locale.category]) convert the implementation-defined native character set. codecvt<char, char, mbstate_t> implements a degenerate conversion; it does not convert at all. The specialization codecvt<char16_t, char8_t, mbstate_t> converts between the UTF-16 and UTF-8 encoding forms, and the specialization codecvt<char32_t, char8_t, mbstate_t> converts between the UTF-32 and UTF-8 encoding forms. codecvt<wchar_t, char, mbstate_t> converts between the native character sets for ordinary and wide characters. Specializations on mbstate_t perform conversion between encodings known to the library implementer. Other encodings can be converted by specializing on a program-defined stateT type. Objects of type stateT can contain any state that is useful to communicate to or from the specialized do_in or do_out members.