Clarion 12 needs a Wide STRING type that's not Null Terminated

@RZaunere in the blog what you said really hits the mark. We want to implement Unicode and have solid code:

Clarion developers care about two things when it comes to string handling: correctness and stability . In Clarion 12 we explored making Unicode capability available through the existing STRING type with a strong emphasis on minimizing impact to existing code. Since then, we’ve continued testing and listening to practical, real-world feedback about how developers want to work with strings.

In Short, we need more than UString, we need a Fixed Length Wide String also… more below…

@RZaunere the current plan for Unicode support is to add only the USTRING that is a Wide CString and Null Terminated ?

The UString is exactly what’s needed to change CString to Wide and call the “W” Windows API :glowing_star:

My “real-world feedback” is that the mass changing of the “Fixed Length” STRING type to the Null Terminated UString type will NOT result in “correctness and stability”. There are several pitfalls likely causing many small bugs.

I recall at the DevCon announcement there was a second Unicode type that was a “W” String … a UTF-16 version of the current ANSI STRING type. A Wide UTF-16 “fixed-length character string”.

Code with STRING changed to WString, that are both fixed length, will risk none of the pitfalls of Null Terminators. It’s a perfect change.

One thing fun about the new WSTRING is it could be pronounced “Schwing!” … that’s what Wayne & Garth would do :smirking_face:


2nd change … the UString(length) declaration should be changed to match CString(length) and require the developer to +1 to leave room for the Null. E.g. a 30 character name requires CString(31) and UString(31).

3 Likes

A post was split to a new topic: Clarion 12 way to handle UTF-8 Unicode String type?

I’m not sure of the utility of a fixed length Ustring. It’s a little hard to pin down since a UTF16 character (code point) can be 4bytes long.
I’m happy that we have something and that I can think of Ustring as a utf-Cstring.
More than good enough I think.

Terminators need to be worried about with string slicing, and we’re still a little unsure on that front.

String slicing per the blog is read only. But frankly string slicing on a USTRING is pretty much useless because of extended graphene clusters.

I said earlier that each utf-16 pair of bytes is not a character, but an approximation. Slicing is one of those cases where the approximation falls over. So consider string slicing on a USTRING to be a bug in your code (because it almost certainly will be.)

3 Likes

The “utility” is its identical to a fixed length STRING that is filled with trailing spaces instead of a <0> Terminator plus garage.

Of course common pitfall code will often result in the UString filled with spaces on the right, then the Null.

4 byte UTF-32 would work exactly the same in either and take 2 UShort’s.

DEFINITELY looking forward to this!