No it wouldn’t. Given an attribute you will still be able to create fixed length strings just for this purpose but keep in mind that if you try to OVER that onto a Unicode string, well, good luck.
BOTH!
It’s been 4 years since my last contract and in the past 2 years I’ve sold 9 upgrades, 4 of those were for Clarion 12.
It also has to do with short term memory loss that I’ve struggled against over the past 20 years since my triple bypass. It’s not easy scanning lines of code and suddenly wondering what you just read. It’s been a fight I’m tired of fighting.
Oh, and with all that, it simply isn’t fun anymore.
72 and living only on Social Security which ain’t a lot!
As to the drag it would add to files and keys, that’s up to the driver developer. The world has moved on and Clarion is not catching up very well… that’s a shame.
And code pages are old world, that’s why Unicode was created… every character is in the definition. Yes, there are encoding options. Personally I think UTF-16 would be the best since the core of Windows is written for UTF-16. But I’m certain there will be several, EXPECTED, refusals on that point. ![]()
My intent, beyond answering a lot of questions about why Unicode isn’t there yet, ie: Reports, was to try to get others away from the 1:1 ratio of chrs to bytes. From my point of view THAT’S what’s going to cripple Clarion. And I’m still waiting for someone to show me a string based command in Clarion that WOULDN’T work if you look at the values returned as characters instead of bytes. Even string slicing would work since, as is, it IS based on character counts but that’s old world 1:1 and that’s going away.
MyString STRING(80),ANSI
MyBytes BYTE,DIM(80),OVER(MyString) or NOT over, just on it’s own, does the exact same thing.
They remain viable even if the entire RTL switches to character counts instead of bytes.
Completely agree. Reports absolutely need to be EMF. STRING, CSTRING, ASTRING types can all automatically convert when being added to the EMF. USTRING (and BSTRING?) Can be injected as is.
There’s nothing preventing reports from working with USTRING in the mix.
Thats not entirely accurate. The number for Nchar and Nvarchar refers to the number of byte pairs. (Ie the string is n*2 bytes). [The following is from the MSSQL docs, implementation may vary between database engines;
" For UTF-16 encoding, the storage size is still two times n bytes, but the number of characters that can be stored might be smaller than n ,".
40 sounds like an arbitrary length, so I’d dig into the source of that number. I’d also consider the nature of the data being entered. For purposes of this answer, we can assume the language measures the size in code points (as SQL does). Most languages, with fixed length strings does this.
If the field was to contain say a human name then I’d provision at least 2 points per character, plus I’d add 20% more. So in this case String(100).
If the text could contain emojis, like say its a comment field, or reason field etc, (and i have to guarentee space for any 40 characters) then i need at least 12 code points per character, so String(960).
Of course the root of the problem is a) the idea of “fixed number of characters”, coupled with the customer (and most programmers) using “characters” as a unit of measure.
Put another way, the question is moot since 40 is an arbitrary number. Most string lengths are. Why use 40 for a LastName , why not 41? Or 42? Or 38? Letting go of the arbitary number lets you think about the text, and what will be stored in there. Then you massivly over-provision anyway because you (much less your customer) has never been to Sri Lanka.
Ideally of course, what we need are variable length strings. But that is probably a bridge too far for recommending to SV at this stage.
Regardless, the principle of backward compatibility insists that old things are left as-is, and new things are added to introduce new concepts.
Arbitrary lengths? What aren’t when it comes to data storage? I used a number, pardon me, sir! ![]()
years ago I knew someone who was a strong advocate of Pick databases.
there were various implementations - Universe, Revelation, OpenInsight etc that I think came from the original work of Dick Pick in the US Army around 1965.
there was a whole Pick operating system at one stage while later the various Pick-style databases ran on the more common OS’s. Anyway these were extremely flexible in that you could have fields or attibutes of any size and any number.
I guess the modern equivalent might be storing XML or JSON in a database rather than fixed-length fields/columns.
Indeed the length is arbitary. Since it is arbitary it doesn’t need to be exact.
Thats important. If you can accept cases, with a limit of lower than 40 chars (like if you set it to be 40 code points) then youd be fine. If the requirement is an absolute “any 40 chars” then that’s a very different requirement.
Fortunately most “number of characters” are arbitary, which means redefining them as “number of code points” is not a big deal.
When it comes to data storage then variable length is preferable to some “number of characters” count, although that does not map well to the current Clarion types.
SQLite today treats all strings as variable lenth. There are no string length restrictions. In othrr darabases you might use a NVARCHAR(MAX) or similar.
Equally SQLite allows any data type in any cell, regardless of column type. This is not especially useful though and can make client-side code (aka drivers) somewhat more complicated.
So every “need” your client has for data is always arbitrary?
Who do you write software for?
If a client says they need 21 characters for a certain entry because, wait for it, it NEEDS to be that long, do you consider that arbitrary?
Consider an invoice system for a store, how do you decide the maximum length available for such things as last names or addresses? They are ARBITRARY and defined by the developer. Well, that holds true for most that I know.
You’re not discussing anything, you’re simply dismissing everything without discussion.
I need to feed my critters, it’s 28F right now with flurries! (birds & squirrels)
Oh, and I’m still waiting for someone to give me a string operation in Clarion that wouldn’t continue to work on character counts instead of bytes.
For some reason I feel as though I’m on the verge of getting slapped on the wrist. If so, please make it the right wrist, I have pseudogout, again, in my other. ![]()
Time for this old curmudgeon to say “good day!”
I’d dig into where that number came from. Typically, in my experience, customers error on the low side when estimating field lengths.
Regardless, once you’ve determined the number of characters, you then need to translate that into code points.
If Clarion declared it in “characters” then it would need to assign 24 bytes of ram per character.
If Clarion declared it in code points, then it assigns 2 bytes per code point. In thst case the programmer could decide the number of code points required, based on the characters likely yo be used.
Most functions (even slicing) could be made to work. Performance though would be poor. Since characters have variable length, every function would need to parse from the front of the string. So performance for things like SUB would be poor. String slicing could be made to work (with sufficient work in the compiler) but performance would go from very fast to very slow.
One could expect that code that used string functions would go around 100 times slower. Processes that rake roughly a minute now might take hours to complete.
Of course any use of OVER would fail without code changes. Every existing import or export (CSV, XML, JSON, HL7 etc) would fail without code changes.
INSTRING would need fairly major structural changes, including the ability to normalize strings before searching. MATCH would be equally affected, perhaps more so.
Fixed string assignments in code (s = ‘bruce’) would be a big problem. The CLW would presumably need to be converted to utf-16, which would be problematic where ‘’ syntax is used.
Of course code can be changed, so everything is fixable given enough time. Python doscovered programmers would prefer not to do this work.
Hi Lee, @lodestar
Thanks for the discussion — there’s clearly passion behind your ideas, and Unicode is something many of us care deeply about.
I just want to clarify something:
The disagreement here isn’t about whether Unicode should exist in Clarion — everyone in the thread agrees on that. The friction is around how to get there without breaking every existing Clarion app, template, 3rd-party product, and binary protocol that relies on STRING = raw bytes.
When you say:
“I’m still waiting for someone to give me a string operation that wouldn’t work…”
multiple people have answered — OVER, binary slicing, imports/exports (CSV/XML/JSON/etc), and performance implications when indexing non-fixed-width encodings. Those aren’t theoretical. They’re real breakpoints in production code and in the compiler/runtime model.
Your proposal assumes:
- change the RTL internally
- no breaking impact on user code
Bruce and others are pointing out that doing that implies:
- breaking backward compatibility
- redefining how
STRINGbehaves (no longer bytes) - changing the compiler semantics (not just the RTL)
Those are not small changes.
It’s fine to disagree on the strategy — but dismissing concerns as “byte-thinking” doesn’t make the technical problems go away.
The thread works best when arguments address the points being raised rather than the people raising them.
Let’s keep it technical and respectful so everyone benefits from the discussion.
Mark,
From the beginning I was stipulating that STRING(), for purposes of bytes, would need an attribute so the RTL would know it’s a fixed length set of bytes. Something as innocuous as ,ANSI.
From my POV the proper way to do that is BYTE,DIM(),OVER.
Basically I was attempting to point out that the entire RTL and file drivers need to be rewritten for Unicode, just like Windows. For any character based variable, regardless of type, if not marked as ANSI, including file formats in the DCT, would be converted to Unicode, processed and decoded back to Ansi… in the same manner that Windows uses for "A"nsi function calls, ie:
Convert to Unicode, call the “W” function and decode back to Ansi once that call returns.
I cannot foresee a future in Clarion if it’s supposed to handle Unicode unless this approach is adopted. Would it take a massive rewrite, no doubt about it, but a partial support, especially if you need to support Ansi, legacy data, AND Unicode is going to be an impossible nightmare to work with.
Consider using INSTRING() where one bit is Ansi and the other is Unicode???
If the RTL “knows” that a string is Ansi it can handle the conversion for you before actually calling INSTRING() and everything continues to work.
Bottom line a few changes would be required for such uses as OVER and with the same attribute you can make a file structure that supports legacy Ansi data storage.
In the case of an OVER I’m rather certain, in some cases, you’ll need to expand the contents when calling string functions but for static byte “strings” with one simple attribute the RTL would NOT be required to encode or lengthen the variable.
Often I have a problem describing exactly what I’m thinking and that’s always frustrated me. Even in kindergarten they thought I had a major mental problem when given paper and crayons to “draw a house.” Most would end up with a box and triangle and maybe a chimney. Mine looked nothing like that and even my Mom and Dad (a psychiatrist) thought something was wrong until my Mom sat me down at t he breakfast table and asked me to describe my house.
It only took seconds for her to discern I was drawing floor plans, specifically of the house I lived in. It did have doors, window and even a fireplace, just from a different POV. She even asked what the blue and red lines were - water and electricity, what else!
Anyway, you can lock this thread since I have nothing else to share. I’ll consider my wrist slapped.
Using an attribute is just another way of creating a separate data type. Except;
An attribute is opt-out, meaning the code is broken by default until fixed. Whereas a Type is opt-in, meaning that unicode can be explicitly implemented where needed.
Equally all RTL (and programer) functions can easily have STRING and USTRING forms. This is not possible with an attribute. We currently deal with many string types, one more is no big deal.
Thus having a separate type has all the benefits of an attribute, plus allows for more control in the code.
Put another way, the way Clarion (RTL and Programmer code) does automatic type conversion is based on Type, not attributes.
Obviously the RTL can handle strings in all functions, ansi or unicode. That’s not in question. The question is in how to tell the strings apart, and thats best done with a Type, not an attribute.
Then the next question becomes the default behavior of STRING; should it remain ANSI, with a new type (say USTRING) or vice versa. Under the principle of not breaking existing code, i favor leaving the STRING type as is, and adding a new type for new behavior.
Of course, the report engine can trivially deal with both, thats no hindrance at all.
Of course, a full-fledged transition to Unicode (and 64 bits) requires significant changes to the RTL and the compiler. I understand what Lee White is writing about. This raises questions.
- Who are the people who can do this work?
- Who are the people who will pay for this work?
- What percentage of Clarion developers actually need Unicode (and 64 bits) in their daily lives? This is not just because MS has been moving in this direction for 25+ years.
- What is better in 2025 for application developers, to invest in large-scale changes in clarion or to transfer their projects to another tool where all this has long been there.
We can work with unicode now. The easiest way is to use external libraries, for example, webview2 control. If religion does not allow you to connect huge libraries (.net, chromium) to the project, then you can use win api. Мike Duglas showed string. prompt and list controls in unicode in clarion windows. The following screenshot shows Unicode in the entry fields (using the scintilla control, open source C library).
Does this graphically represent what you mean?
An interesting read.
Ok, one last post on this OVER concept.
OVER is NOT a string handler, it’s just another way to access the same memory as defined by some other variable.
I’d also hope that SV, if the RTL’s are being rewritten for Unicode, goes with UTF-16 which would keep it in tune with Windows.
If SV tackles Unicode as Windows did it should be possible to check one compiler directive that makes it all moot. If Unicode support isn’t enabled in your APP then the RTL would handle the data exactly how Windows does: To Unicode - call a WIDE function - back to Ansi.
Every command that is already working with Ansi, ie: 1:1, would continue to function just as it does today. You could recompile all your Ansi based APP’s and they would work, period.
Granted, if you enable Unicode there would be a need for, OMG, slight changes in your coding to indicate when a character based variable is defined if you absolutely must retain 1:1 for whatever reason for, dare I say it, byte counts. But, consider that all the string related commands in Ansi, our current language, that return a numeric value would STILL return the same value except it can be considered a byte offset but should always be thought of as a character count since Unicode, especially UTF-16, is not 1:1.
Think about it this way. Every program we’ve written since forever the Lowest Common Denominator was Ansi, ie 1:1. For Unicode support that LCD becomes Unicode just as it is within Windows.
Lowest Common Denominator - something to consider.
BYE, BYE! ![]()
Which makes me think that I use Tracker PDF which parses the WMF to make the PDF. I would think their API has functions that work with EMF also. Will probably need to rework Craig’s code and/or templates.
Yes by the “Refactored String”, but there is zero information on anything about that. I would think it will work kind of like a dynamic TChar that it will be either ANSI or Wide depending on what it was assigned.
The USTRING was going to be a standard Windows Wide string of UTF-16 USHORT’s that is <0,0> terminated. That was going to be handy for calling the W API. I think that could have been a bit of a problem for many. I would guess the CSTRING will get the same refactoring so can be A or W.
There must be a built in Wide String type for API work. BSTRING’s need to be supported in Group, Class and Queue.
There was also going to be a WSTRING type that was not Null Terminated, so like a Clarion STRING. That got cut somewhere which I think was a mistake. We are all very used to working with a string type that is not null terminated.
There is a Clarion Live webinar on Refactoring the Invoice example that you can watch.
IIRC Bruce wanted to make all the names like 120 characters or maybe 255. He’d say does the length really does not matter, disk space is almost unlimited…
My answer was I have to print those names on an Invoice form that shows in a window envelope, so yes 120 bytes is a problem. Connecting to shipping systems like UPS does have length limits, probably related to what they can print on forms and labels.
Working on Payroll systems the IRS and Banking Direct Deposit have rather short limits.
IOW just because you can doesn’t mean you should. It has to make sense in the user-scape
I put up a demo app using the std clarion report, resizing the string font size if the contents were too many characters and parsed some of those characters onto a second string. This way you could have long string fields, and they would just resize down on paperwork and sometimes split over two lines.
Thought it was on icetips, but couldnt find it after a quick scan.
