I thought windows only went up to UTF-16?
C++ - Unicode Encoding Conversions with STL Strings and Win32 APIs | Microsoft Docs
UTF-8, as its name suggests, uses 8-bit code units.
UTF-16 uses 16-bit code units.
I know UTF-32 exists UTF-32 - Wikipedia but I dont think windows supports it.
C++ - Unicode Encoding Conversions with STL Strings and Win32 APIs | Microsoft Docs
There is actually another Unicode encoding, which is less well-known and less used in practice than its siblings: UTF-32. As its name clearly suggests, it’s based on 32-bit code units. So a GCC/Linux 32-bit wchar_t is a good candidate for the UTF-32 encoding on the Linux platform.
This ambiguity on the size of wchar_t determines a consequent lack of portability of C++ code based on it (including the std::wstring class itself). On the other hand, std::string, which is char-based, is portable. However, from a practical perspective, it’s worth noting that the use of wstring to store UTF-16 encoded text is just fine in Windows-specific C++ code.
It’s worth noting that the C++ standard doesn’t specify the size of the wchar_t type, so while it amounts to 16 bits with the Visual C++ compiler, other C++ compilers are free to use different sizes. And, in fact, the size of wchar_t defined by the GNU GCC C++ compiler on Linux is 32 bits. Because the wchar_t type has different sizes on different compilers and platforms, the std::wstring class, which is based on that type, is non-portable. In other words, wstring can be used to store Unicode text encoded in UTF-16 on Windows with the Visual C++ compiler (where the size of wchar_t is 16 bits), but not on Linux with the GCC C++ compiler, which defines a different-sized 32-bit wchar_t type.
Here the MS wchar_t is defined as a 16bit value
char, wchar_t, char8_t, char16_t, char32_t | Microsoft Docs
The wchar_t
type is an implementation-defined wide character type. In the Microsoft compiler, it represents a 16-bit wide character used to store Unicode encoded as UTF-16LE, the native character type on Windows operating systems.
So all in all, I dont think we have much of a choice over whether we want that structure or not. 