No I did not. I posted a straight ANSI string using ONLY the upper ANSI characters.
MyUnicode USTRING(20)
CODE
MyUnicode = 'Hi, I''m ANSI, how are you?'
Nothing out of the ordinary at all.
No I did not. I posted a straight ANSI string using ONLY the upper ANSI characters.
MyUnicode USTRING(20)
CODE
MyUnicode = 'Hi, I''m ANSI, how are you?'
Nothing out of the ordinary at all.
Thanks for the clarification Lee.
On the doubled apostrophe point: in Clarion, '' inside a string literal is the escape sequence for a single apostrophe character. It does not end the string and start a new one.
So this:
MyUnicode = 'Hi I''m ANSI, how are you?'
is one single literal, and the value is exactly:
Hi I'm ANSI, how are you?
That is why Z’s answer makes sense.
Your literal is plain ASCII, and the only “special” thing in it is the Clarion escape for the apostrophe in I’m.
When assigned to a USTRING, you get the exact text you intended stored (internally as UTF-16), and since it is all ASCII there is no visible change, which is why he described it as “no conversion”.
Lee below is your code plus I added STRING over it
MyUnicode USTRING(20)
MyOverUc STRING(42),OVER(MyUnicode)
CODE
MyUnicode = ‘Hi, I’‘m ANSI, how are you?’
HexDump( MyOverUc )
Doing a Hex Dump of the MyOverUc String would show the each ASCII character with a Null <0> after each. I.e the Unicode Little Endian UTF-16 Code Point for ASCII 0-127 is that value in byte 1 and <0> in byte 2 of the UShort.
MyOverUc
Hex=‘H<0>i<0>,<0> <0>I<0>’<0>m<0> <0>A<0>N<0>S<0>I<0> ... etc
That’s what happens to 0-127 converted to UTF-16. The storage of characters in UTF-16 is one UShort per Code Unit. It must be stored that way.
So yes a UString can be assigned an ANSI string, but it will be converted to UTF-16.
Carl, that is a great way to show it.
That is exactly the distinction that keeps getting blurred in this thread: the visible characters are unchanged for ASCII, so it feels like “no conversion”, but the underlying representation has to widen into UTF-16 code units, which is why you see the interleaved null bytes when you overlay and hex dump it.
The conversion only becomes observable at the character level when you get above 127 and codepage rules start influencing what bytes map to what Unicode code points.
Kind of hard for me to read that since I don’t have a copy of Clarion that understands USTRING.
I was going by the reply I got from Z… nothing else to work from on this end.
If it does what I expected it to do, since it’s a UNICODEstring then problem solved.