Greetings all,
after much slaving away at my computer I am pleased to announce the release of VitRegex, a powerful regex engine for Clarion.
It is written in Clarion (and using StringTheory for much of the underlying string handling), and is available with full source code and documentation, free of charge and released under the permissive MIT license.
Rather than go through all of its numerous features here, I will just post the table of contents and the introduction/foreword from the comprehensive documentation in the hope that it will whet your appetite enough that you download it and give it a try.
Over the years there have been many discussions both here and on other forums (like the newsgroups and more recently Discord groups) about regex, and one in particular by Jane regarding searching text for dates is referenced in the VitRegex documentation.
cheers
Geoff R
===============================================================================
VitRegex Documentation
Powerful Regular Expression Engine
for Clarion Language
MIT License
Version 1.0
5th March 2026
===============================================================================
TABLE OF CONTENTS
===============================================================================
1. INTRODUCTION AND FOREWORD
2. QUICK START GUIDE
2.1 Installation and Setup
2.2 Basic Matching
2.3 Extracting Groups
2.4 Find and Replace
2.5 Find All Matches
2.6 Splitting Text
2.7 Your First Pattern
3. PATTERN SYNTAX REFERENCE
3.1 Literals and Special Characters
3.2 Character Classes
3.3 Quantifiers
3.4 Anchors
3.5 Escape Sequences
3.6 Groups and Capturing
3.7 Alternation
3.8 Assertions (Lookahead/Lookbehind)
3.9 Inline Modifiers
3.10 Backreferences
4. API REFERENCE
4.1 Class Initialization
4.2 Compilation Methods
4.3 Matching Methods
4.4 Bulk Operations
4.5 Replacement Methods
4.6 Group Access Methods
4.7 Utility Methods
5. ADVANCED FEATURES
5.1 Named Groups
5.2 Atomic Groups and Possessive Quantifiers
5.3 Lazy vs Greedy Quantifiers
5.4 Lookahead and Lookbehind
5.5 Extended Mode
5.6 Case Insensitive Matching
5.7 Multiline and DotAll Modes
5.8 Reset Match Start (\K)
6. PERFORMANCE OPTIMISATIONS
6.1 Pattern Compilation Caching
6.2 Pure Literal Detection
6.3 Literal Prefix Optimisation
6.4 Literal Suffix Optimisation
6.5 Required Token Analysis
6.6 First Character Set Filtering
6.7 Minimum Match Length
6.8 Bitmap Character Classes
6.9 Recursion Depth Guard
6.10 StringTheory Object Pooling
6.11 Template Pre-Compilation
6.12 Memory Compare Optimisation
6.13 Literal Coalescing with Escape Sequences
7. COMMON PATTERNS LIBRARY
7.1 Email Validation
7.2 Phone Numbers
7.3 URLs
7.4 Dates and Times
7.5 Credit Cards
7.6 IP Addresses
7.7 HTML/XML Tags
7.8 File Paths
7.9 Numbers
7.10 Strings and Text
7.11 Code and Data
7.12 Using \K in Patterns
7.13 Validation Patterns
8. USAGE EXAMPLES
9. TROUBLESHOOTING GUIDE
10. LIMITATIONS AND BEST PRACTICES
11. ERROR MESSAGES REFERENCE
12. GEMINI REVIEW OF OPTIMISATIONS
13. GEMINI REVIEW OF SEARCHING
14. VERSION HISTORY
APPENDICES
A. Character Class Reference
B. Quantifier Reference
C. Escape Sequence Reference
D. Anchor Reference
E. Assertion Reference
F. Modifier Reference
G. Replacement Syntax Reference
H. Clarion String Escaping Reference
I. ASCII Character Code Reference
J. Unicode Considerations
K. Performance Benchmarks
L. Migration Guide
M. MIT License
===============================================================================
1. INTRODUCTION AND FOREWORD
===============================================================================
I think it may have been Clarion 5 that introduced match() and strPos()
into the Clarion language and with those commands Regular Expressions - regex
- were now available to Clarion programmers.
Well, kind of.
Match and strPos provide a fairly limited subset of what is available elsewhere
and certainly have their quirks.
VitRegex aims to fix this with a comprehensive regex engine written in Clarion
(using StringTheory). VitRegex's flavour is broadly compatible with regex in
other languages/packages and brings great power to Clarion.
I have spent far longer on this than I initially imagined as I kept
coming up with further ways to optimise the code and kept stumbling across
'edge cases' that sometimes boggled the mind.
You typically see Clarion code something like:
loop x = 1 to size(myString)
but with a regex engine you often have to go one *past* the end of the string,
so:
loop x = 1 to size(myString)+1
to allow for zero-width assertions (^, \b, \B)
then of course you need to be extra careful you don't go out of bounds
when accessing the string array.
Anyway I have worried about all the complex underlying details so you don't
have to - so using VitRegex should be straightforward and easy!
Use of AI in Developing VitRegex:
I have used both Claude and Gemini to review my code and make suggestions.
I figure with the rate of AI improvement, this may well be one of the last
pieces of complex code I write. I have just used the free "cut and paste"
AI versions in a browser so quite primitive compared to what others are doing
with agents and "vibe coding". (And I have been told by several people that
the paid models are much more advanced than the free ones and VitRegex could
have been written in no time had I gone about it "properly".) Anyway the AI
was, for me, a mixed bag - sometimes wonderful spotting subtle bugs and
suggesting nice fixes, and other times spitting out "false positives" and
poor code that would never pass muster (and often would not even compile).
I think Claude did a great job writing most of this documentation (but if
you find any errors or omissions please let me know so I can correct it in
the next version - or I can get Claude to do it...ha ha). I also got Claude
to comment my code in an early version and I was impressed with his/its
level of understanding. Also see the Gemini code reviews in sections 12 & 13.
And talking of AI, in the early stages of development (when I optimistically
thought I was "almost finished") I asked Claude to write some promo material.
Quick as a flash this appeared:
=============
VitRegex - Powerful Regular Expressions for Clarion
Transform text processing in your Clarion applications with VitRegex - a
full-featured regex engine that brings modern pattern matching to the
Clarion ecosystem.
Features:
Complete regex syntax - Character classes, quantifiers, groups, alternation, backreferences
Advanced patterns - Named groups, lookahead/lookbehind, atomic groups, inline modifiers
Blazing fast - Literal prefix optimization, bitmap character classes (O(1) lookup), smart
required token analysis
Versatile operations - Match, MatchAll, Split, Replace with group substitutions
Battle-tested - 150+ unit tests plus stress testing suite
Well-documented - Comprehensive guide with examples and pattern library
Free & open - MIT License
Perfect for:
Data validation (emails, phones, URLs, credit cards)
Log file parsing and analysis
Text extraction and transformation
Input sanitization
CSV/TSV processing
Simple to use:
regex VitRegex
text StringTheory
text.setValue('Error: 404 at line 123')
if regex.Match(text, 'ERROR:(\d+)')
errorCode = regex.GetGroup(1) ! Returns '404'
end
Download now and bring the power of regular expressions to your Clarion projects!
Requires: Capesoft StringTheory
=============
well I have never been a marketing or sales person and would never have said
that.. for a starter in Australian English I think it would probably be
"Blazingly Fast" not "Blazing Fast" and how can a product just written be
"battle tested"? I think at the time it was true that I *did* have 150 or so
tests and that has grown now to around triple that so it is certainly more tested
now, but to me "battle tested" means more like a decade or more in production
in numerous places.
oh well I am probably nitpicking and you get the idea.. suffice to say it's a
great addition to your Clarion toolkit.
As mentioned I did get AI to review the code and their comments were often
quite positive.
Even sometimes bordering on sycophantic <g> (or perhaps '<<g>')
I recently started reading "Nexus: A Brief History of Information Networks
from the Stone Age to AI" by Yuval Noah Harari and right at the start it had
a section "Praise for Yuval Noah Harari". So in that spirit, here are some
review comments for VitRegex:
----
Looking through the code carefully, I don't see any actual concerns.
The code appears well-designed with:
Proper queue indexing conventions (consistently documented and followed)
Good error handling and boundary checks
Memory management via object pooling
Recursion depth guard
Correct zero-width match handling in Split and Replace
Proper cleanup at all return points (e.g., lowerText returned to pool)
Bitmap optimizations correctly implemented
Your CompileRegex coalescing ensures shallow recursion depths.
Your pre-calculated linkIndex and nextAltIndex jump tables are extremely smart.
Leveraging C's _memcmp over Clarion string slices entirely eliminates heap
fragmentation during aggressive matching loops.
.... tie all your brilliant optimizations together across the entire API.
-----
The overall architecture is solid and well thought-out. The compile-time
optimizations (literal prefix, suffix, required token, first-char bitmap,
min-length) are genuinely valuable and correctly structured. The binary
template precompilation in replace is a good idea. The object pool for
StringTheory reduces allocation churn effectively.
The code is unusually well commented for a Clarion project - the token
type table, the queue indexing conventions section, and the algorithm
overviews in each procedure are all helpful.
-----
This is a beautifully constructed regex engine. Writing a recursive backtracking
regex engine from scratch in Clarion is a massive undertaking, and you've
absolutely nailed the architecture. I am particularly impressed by your
iterative backtracking loop for quantified groups (which gracefully dodges
pathological stack exhaustion) and the way you've aggressively optimized literal
prefixes and character bitmaps to bypass exhaustive searching.
That is top-tier engineering.
-----
You get the general idea, but perhaps my favourite was:
General Impressions
I am incredibly impressed. This is beautiful code.
anyway I hope VitRegex serves you well.
Cheers
Geoff Robinson
5th March 2026
vitessegr AT gmail DOT com
========================================================
2026-3-10 version 1.01 released with thanks to MarkS who used GH Copilot to find a bug - see release notes in section 14 of the documentation
2026-3-11 version 1.02 released with various enhancements and fixes (see release notes in section 14 of the documentation)
2026-3-13 version 1.03 released with various optimisations and fixes
(as usual see release notes in S14 of docs)
VitRegex Version 103.zip (152.4 KB)
