New SoftVelocity blog - Clarion 12 Beta Mid January with USTRING

Yes, a somewhat separate topic I think. The blog covers a lot of ground, but equally leaves us a lot of ground to experiment with. For example, it’s not clear what happens when you assign a STRING or CSTRING into a USTRING. Or perhaps more interestingly what happens when you assign a USTRING into a STRING etc. Presumably it makes use of system{prop:codepage} - but we’ll have to wait and see.

apart from the obvious u’ syntax, it’s not clear how the text inside the string should be encoded for non-ansi characters. As you say, that could be pasted from the clipboard, or in some cases typed on the keyboard (if you have a foreign keyboard.) Is it tenable to have ANSI CLW files and support unicode characters (utf-8 or utf-16, either way) in hard-coded strings?

Equally, how do characters work in a string assignment? As in
us = ‘<65,66,67,68,69’>
us = u’<65,0,66,0,67,0,68,0,69,0’>

Frankly, I think a lot of this will become evident once 12.1 (or whatever it’s called) ships. And I expect we can make some doc threads here to cover these situations.

That is exactly true and something that happens often when using AI for code assistance unless you specifically guide it to provide only ANSI output.

The bad thing is that the errors generated are nonsensical.

For example the compiler can complain that you have an error at line 9000 in a source file that only has 1000 lines.

1 Like

sure developers are happen there is some progress…

1 Like

Personally, I think SIZE should always represent the amount of memory used by the variable, not the length of the string for that special use.

Instead, I think that we should overload the LEN operator with a second parameter:

UStr USTRING(20)
  CODE
  UStr = 'String Value'
  Memory = SIZE(UStr)           !40  How much memory?
  ActLen = LEN(UStr)            !20? Traditional (trailing spaces in USTRING?)
  ClpLen = LEN(UStr, LEN:Clip)  !12  Like LEN(CLIP(UStr))
  MaxLen = LEN(UStr, LEN:Max)   !20  Max chars, like LEN(ClaStr)

Of course, one would also need to search one’s existing code to find places that have used SIZE with string parsing, and change those to LEN(S, LEN:Max).

Regardless of the chosen approach, it’s a PITA. :man_shrugging:

1 Like

Mike I think this would return 42 – including NULL terminator bytes

Memory = SIZE(UStr) !40 How much memory?

Declaration:  USTRING(20)
Allocation:   [    40 bytes for 20 characters    ][2 bytes null]
              |<────────── 20 chars × 2 bytes ─────>|
Total Size:   42 bytes
1 Like

I agree. Otherwise we end up with SIZE meaning different things depending where it’s used. Consistency really requires it to return the number of Bytes allocated, regardless of the variable type.

2 Likes

The current Help file has all the Unicode stuff that was not released in it. It days the compiler supports files encode UTF 8 or 16.

The C11 Clarion compiler supports source and include files in UTF-16 (little endian) and UTF-8 encoding to allow Unicode string literals without the necessity to use explicit character codes inside the <;> meta-symbols

The new compiler simply must support Unicode source files.

Based on the description in the blog post, this type should be called CSTRINGW.

From the Clarion help, SIZE function returns “memory size in bytes”.
From the Clarion help, LEN function returns “length of string”.

USTRING  EQUATE(CSTRINGW)
us       USTRING(21)               ! 21 characters including trailing NULL char.

  us = 'გილოცავთ ახალ წელს'       ! 18 chars assigned
  x = SIZE(us)                     ! returns 42 bytes = 21*2
  y = LEN(us)                      ! returns 18
  й = INSTRING('ახალ', us, 1, 1)  ! returns 9

The current Help file has all the Unicode stuff under String Constants says:

Unicode string literals can include the same { } and <;> meta-symbols as their ANSI equivalents. Numbers listed between < and > meta-symbols are treated as 16-bit wide character codes.

Between < and > can have 16 bit value so Decimal values 0 to 65535. In your example for <decimal> I think that simply removes the “,0”. For Hex I assume it will allow <4 digits H>, but in Little Endian so it matches exactly the internal hex:

us = u'ABCDE'
us = u'<65,66,67,68,69’>  ! Decimal of ABCDE
us = u'<4100h,4200h,4300h,4400h,4500h’>  ! Hex

The Euro Sign is U+20AC or 8,364. I assume that page is in Big Endian so for Windows:

euro = u'<8364>'
euro = u'<0AC20h>' !Little Endian of U+20AC - ? wrong see below ?

Be nice to have a Big Endian hex format like the common U+ or \u e.g.

euro = u'<U+20AC>' !Idea of Big Endian
euro = u'<\u20AC>' !Idea for Big Endian

Edit 1/10/26
I’m probably wrong that a Unicode String <16 bit hex> is flipped to Little Endian. When you write a constant hex number LONG(20ACh) it’s Big Endian, so I would expect the same e.g. u'<20ACh>'.

It would be nice to have a syntax for UTF-32 values that are simpler than the 2 UTF-16 surrogate pairs. E.g. a smiling emoji in C# can be coded as '\U0001F60A' but in UTF-16 is u'<0D83Dh><0DE0Ah>'

1 Like

Has anyone heard if the template prompts are going to support Unicode???

If so, are there functions available to handle entry if, for instance, it is in Unicode?

Z just said that it wasn’t currently implemented in the template system in the newsgroups, but didn’t say if it was in the plans.

I suspect that will require a lot more work in the IDE.

He mentioned that the Embeditor could handle Unicode because it was a wrapped control, but I suspect the rest of the IDE may not.

I can see potential problems there just for the sheer fact that a lot of people still use the standard embed editor (or both as the need may be).

It will be good to see Clarion work well with Unicode one day, but I see a rough and bumpy ride to get there.

Could you please copy any Z newsgroups posts and paste them here?

Done.

Posted on a new thread here:

Anyone know if there is going to be a Predefined Compiler Flag for the version of Clarion 12 which has Ustring?

If so, do you know what it will be?

TIA