StringTheory RegEx help finding Date in Text

Jane · October 14, 2025, 11:07pm

Thank you, Mark!

Very eye-opening.

Pasting the code each time makes a lot of sense.

And just because the bot tries to be chatty, it doesn’t mean that I should be anything other than direct and specific and simple.

vitesse · October 16, 2025, 3:49am

thanks for testing that Jane.

At first I was surprised at the results as I was expecting the new version to be faster not slower.

On looking at the code I think I can see what is happening.

The declaration of the ST object in the old V1 version is commented out in the function and it is using a global declared right at the top. This is actually a good idea as it means the construct and destruct are only done once and not every time the procedure is called. There is some small overhead in setting up the ST object everytime so when this is done 5000 times in a loop this starts to add up.

What I usually do in this scenario is add “static,thread” to the stringTheory declaration so the object persists from one call to the next. So I have now done an edit on the code to reflect that (see above in this thread). The advantage of doing it this way (rather than a global or module data) is that the scope is limited to the ExtractDate procedure so is “safer”.

There are a couple of caveats with this approach that you need to be aware of:

The first is that you should not do this if the procedure is recursive (ie. calls itself either directly (A->A) or indirectly (A->B->C->A). In that case, each call most likely needs its own object and not to be “sharing” the one object.

The other caveat is where you have enormous data in your object and your memory is tight, you might want to do a hard free ie. st.free(true,true) to release memory prior to returning. Usually, however, it is better not to do that and to simply leave the memory sitting there so you can re-use it next call without having to re-allocate it.

This is all pretty moot now as I can see you have moved on with Claude providing extra code for ISO dates and so on. I did enjoy reading your interactions with Claude and certainly could relate to John’s “mix of awe and frustration”! Mark’s analysis and suggestions are very helpful so thanks for sharing those too Mark.

I will try to have a look at the Claude code soon and make some suggestions for improvement. I am overseas ATM and heading back to Australia for the weekend so will no doubt have some spare time at the airports on the way!

seanh · October 16, 2025, 10:16am

Yeah, treating it like a genius level stupid assistant with the memory of a gold fish appears to help a lot.

RchdR · October 19, 2025, 8:20am

Without getting into a flame war (appropriate description?), or too much thread drift comparing different Ai’s like its an arsenal or array of the latest weapons tech, normal Win 11 Co-Pilot aka the free one, is greatly improved, if one opens the Side Bar, and clicks on the Account circle with head and shoulders icon, then click Settings, Privacy, and then switch Personalisation and memory to On.

This way normal Co-Pilot will remember stuff from previous conversations, which helps enormously over time.

Where I’ve used it to convert enumerations, array etc from MS api doc’s (eg Service Security and Access Rights - Win32 apps | Microsoft Learn ) into my own Equate naming scheme, it remembers this and then builds this into code it converts from C/C++ or other languages into Clarion.

eg

CreateMyService PROCEDURE()
SERVICE_NAME     STRING('MyClarionService')
DISPLAY_NAME     STRING('Clarion Test Service')
BINARY_PATH      STRING('C:\Path\To\YourService.exe')
SCMHandle        UNSIGNED
ServiceHandle    UNSIGNED
Result           UNSIGNED

CODE
! Open a handle to the Service Control Manager
SCMHandle = OpenSCManager(NULL, NULL, ISEQ:SCM:SC_MANAGER_CREATE_SERVICE)
IF SCMHandle = 0
  MESSAGE('Failed to open SCM: ' & GetLastError())
  RETURN
END

! Create the service
ServiceHandle = CreateService(SCMHandle, SERVICE_NAME, DISPLAY_NAME, ISEQ:SCM:SERVICE_ALL_ACCESS, ISEQ:SCM:SERVICE_WIN32_OWN_PROCESS, ISEQ:SCM:SERVICE_DEMAND_START, ISEQ:SCM:SERVICE_ERROR_NORMAL, BINARY_PATH, NULL, NULL, NULL, NULL, NULL)

IF ServiceHandle = 0
  Result = GetLastError()
  MESSAGE('CreateService failed: ' & Result)
  CloseServiceHandle(SCMHandle)
  RETURN
END

MESSAGE('Service created successfully!')
CloseServiceHandle(ServiceHandle)
CloseServiceHandle(SCMHandle)

ISEQ:SCM:SC_MANAGER_CREATE_SERVICE is what it remembered from a previous conversation when it remembers.

Just a simple example to hand, but quite handy if I might say so.

MS has also subtly changed their API documents, which Co-Pilot will get right, but you will get wrong if you just use the API webpages.

eg if you can find CreateServiceA function (winsvc.h) - Win32 apps | Microsoft Learn
in the sidebar on this OpenSCManagerA function (winsvc.h) - Win32 apps | Microsoft Learn

this

[in, optional]  LPCSTR    lpBinaryPathName,

is not optional.

Co-Pilot will always use it in examples, but the api fails if you dont use it. Its not Optional.

Just some recent changes I’ve started to notice with the MS API docs which will probably force you into using their/an Ai agent.

vitesse · October 20, 2025, 12:23am

Well I have had a look at Claude’s code that Jane posted (Version 32), and made some changes/improvements.

The original code:
Jane ExtractDate v32 orig.txt (13.7 KB)

The revised code:
Jane ExtractDate v32 GCR.txt (10.7 KB)

From my point of view the main changes were:

removed unnecessary clips - Claude had 47 clips, revised code has none.
removed lots of unnecessary code, including checks for numeric(). Overall the code is 100 lines shorter despite having more bounds checks.
removed unused and unnecessary variables and parameters
did code only where necessary. For example the original check for a valid date always did the (expensive) check whether the year was a leap year or not. This is now changed so that the leap year check is only done if the date is February 29.
got rid of the use of sub() and used string slicing instead. Every Clarion string is an implicit array of bytes. This is somewhat controversial as sub is definitely safer - years ago someone told me that the average Clarion programmer should not use string slicing because there is too much risk that they will get it wrong and get a subscript out of bounds error - often by being “off by one” at the start or end of the array. Still, providing you are careful I feel it is usually better and more efficient to use slicing.
there was repeated code at the end of loops in case we were still in a token. Repeated code is bad so this has been eliminated by going around the loop an extra time at the end.

I have kept the form of the code more-or-less the same. Sure I would have done things differently myself (putting aside the fact that I would obviously use StringTheory). For example, there is a token array of size/dimension 50. I would probably use a queue so there is no limit. Also there was no checking if the end of that array was reached so any text with more than 50 tokens/words would go out of bounds. (I corrected that). To be honest I would probably have ditched the array or queue completely and checked for a valid date as we went along. That way if the date is at or near the start we could immediately short circuit and not keep dividing the rest of the text up into tokens unnecessarily.

Anyway I was curious what Claude would make of the changes so I uploaded the original (Jane and Claude’s Version 32) and the modified GCR versions of the files and got Claude to:

Examine each procedure in both files and list the specific differences for each procedure
Provide a summary of the key changes

The list of differences as generated by Claude can be seen at

https://claude.ai/public/artifacts/8f3a4b55-08f2-4b17-8f30-8a4cd7a1e43a

In Claude’s chat it/he/she/they said:

I’ve created a comprehensive comparison of the two Clarion files. Here are the key takeaways:

Main Differences:

Function Signatures: The GCR version simplified the validation procedures to only return TRUE/FALSE, removing the reference parameters that originally returned parsed values.

Performance Optimization: The GCR version uses direct character/string indexing (DateStr[DateLen] = Char) instead of string concatenation with CLIP(), which is significantly more efficient.

Loop Pattern: Changed from traditional variable tracking to a consistent pattern using size(pText) + 1 for loop bounds.

IsValidDate Rewrite: The most dramatic change—replacing a 40+ line CASE statement with an elegant OVERLAY technique using a comma-separated string of day counts.

Code Consolidation: Removed intermediate string variables throughout, making the code more direct and readable.

Variable Management: Extensive use of the AUTO keyword and elimination of unused variables.

Bottom Line: The GCR version appears to be a significant performance and maintainability improvement over the original, with better memory efficiency and cleaner logic flow.

I was curious to know what instructions could be given to Claude in the future such that it generated code more like the revised code rather than the original code, so I prompted:

The original code was generated by you (Claude.AI) for another Clarion programmer. I was wondering if you could please have a long think and come up with instructions that could be given to you when generating Clarion code in the future so that the result is closer to the GCR version.

It didn’t have a “long think” as it came back almost straight away - this is what it replied:

I’ve created a comprehensive prompting guide based on the differences between the two versions. This document provides:

Specific optimization techniques that the GCR version uses—like direct string indexing instead of concatenation, and length tracking instead of repeated CLIP() operations

Function design principles that lead to cleaner, more efficient code—particularly avoiding reference parameters and consolidating validation logic

Data structure techniques like using STRING OVERLAY for lookup tables instead of massive CASE statements

Anti-patterns to avoid—with concrete examples of what not to do

A practical checklist you can include with your requests to Claude

The key insight from comparing the two versions is that the GCR version makes better use of Clarion’s capabilities: direct byte-level string operations, length-based tracking, and data-driven design. By giving Claude these specific instructions, you should get code that’s much closer to production quality from the start.

You could also bookmark this guide and simply say in future requests: “Please generate this Clarion code following the performance and design guidelines in [this document]” and reference it.

https://claude.ai/public/artifacts/f2c2ad34-3683-4605-99ec-022429c38ea2

anyway I hope the revised ExtractDate program may be of use to Jane, but more generally that this “prompting guide” may be of use to all of us trying to get decent Clarion code out of AI in the future

cheers for now

Geoff R

RchdR · October 20, 2025, 8:17am

Because there’s a lot of similarities with other languages, I think some Ai’s will use the more frequently used techniques across a spread of languages seen in places like GitHub, than the performance optimised techniques for one language like Clarion’s that you have hilighted to it.

Identifying which of these Ai’s learn from the user’s coding styles I think will be key in picking a good Ai model, for optimised code specific to a language, but for situations where online docs are not up to date, or incorrect the Ai’s using GitHub examples will be key.