Issue with {n} Quantifiers in Clarion’s MATCH Function

Hi,

I’ve been working with the MATCH function and encountered an issue with regex quantifiers {n}. Specifically, patterns using {n} or {n,m} seem to consistently fail. Here’s what I’ve observed:

  1. Examples That Fail:
  • ^[0-9]{8}$ (intended to match exactly 8 digits).
  • ^{A{3}|B{2}}$ (group with quantifiers).
  1. Examples That Work:
  • Simple alternation: {A|B|C} works fine.
  • Exact literal match: '123-4567-8' matches as expected.
  • Grouping and alternation without quantifiers also work.
  1. Testing Environment:
  • Clarion version: 11.1.13855
  • Regex mode: Match:Regular
  1. Question:
  • Has anyone successfully used {n} quantifiers in Clarion’s regex implementation?
  • Are there known limitations or workarounds for this?
  • Is there any additional escaping required for {n} in Clarion beyond doubling { as {{?

Any insights or examples from your experience would be greatly appreciated, even if it’s that clarion regex doesn’t support this.

Thank you

Mark

AFAIK Clarion’s RegEx syntax did not implement a {repeat} Pattern count. The Help does not mention.

Most, or all, other RegEx syntax used ( parens ) for Groups while Clarion used { curly braces } so I think it would be too ambitious to also use them for repeat.

For example I code American SSN numbers as:

[0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9][0-9][0-9]

I wish there was a {repeat} to allow:

[0-9]{3}-[0-9]{2}-[0-9]{4}

You can probably use Clarion’s normal string literal feature of {repeat} a constant character.

1 Like

Thanks Carl,

I was wishing for a present for Christmas, but I’ll live with the Clarion Tax :wink:

Mark

Hi Mark and Carl

over on the newsgroups back in March last year, Alberto asked about this and I knocked up a function for him to expand the regex so it worked in Clarion. I released it under the MIT license so it is unencumbered.

ho ho ho - here’s a Christmas present and no need for any Clarion tax!

hth and Merry Christmas!

Geoff R

prototype is:
 
ExpandRegEx       PROCEDURE(string pRegex),string

and the code is:

ExpandRegEx       PROCEDURE (string pRegex)
!-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
!
! ExpandRegEx - expand repetitions 
! eg. [0-9]{1,3} expands to [0-9][0-9]?[0-9]?
!     [0-9]{0,4} expands to [0-9]?[0-9]?[0-9]?[0-9]?
!
! Version 1.0 written on 15th March 2023 by Geoff Robinson vitesse AT gmail DOT com 
!
! MIT License
!
! Copyright (c) 2023 Geoffrey C. Robinson
!
! Permission is hereby granted, free of charge, to any person obtaining a copy
! of this software and associated documentation files (the "Software"), to deal
! in the Software without restriction, including without limitation the rights
! to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
! copies of the Software, and to permit persons to whom the Software is
! furnished to do so, subject to the following conditions:
!
! The above copyright notice and this permission notice shall be included in all
! copies or substantial portions of the Software.
!
! THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
! IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
! FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
! AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
! LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
! OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
! SOFTWARE.
!
!-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

st        StringTheory
ln        StringTheory     ! line
stTmp     StringTheory     ! temp worker

c         string(1),auto   ! char
b         byte, over(c)  

x         long,auto
y         long,auto

prevLen   long
prevLen2  long

minRepeat long,auto
maxRepeat long,auto

  code
  st.setValue(pRegex,st:clip)
  loop x = 1 to st._dataEnd
    c = st.valuePtr[x]
    case b
    of val('[')
      y = st.matchBrackets('[',']',x)
      if y = x + 1 ! allow for where ] is included char
        y = st.findChar(']',y+1)
      end
    of val('{{')
      y = st.matchBrackets('{{','}',x)
    else
      y = 0
    end
    if y
      st.addLine(st:end,st.valuePtr[x : y])
      x = y 
    else
      st.addLine(st:end,c)
    end
  end 

  st.free()  
  loop x = 1 to st.records()
    prevLen2 = prevLen
    prevLen  = st._dataEnd
    ln.setValue(st.getLine(x))
    st.append(ln)
?   assert(ln._dataEnd > 0)
    if ln._dataEnd = 1        or |
       ln.valuePtr[1] <> '{{' or |
       ln.valuePtr[ln._dataEnd] <> '}'
      cycle
    end
    stTmp.setValue(ln)
    stTmp.crop(2, stTmp._dataEnd - 1) ! strip { and }
    stTmp.split(',')
    if stTmp.records() <> 2 then cycle.

    stTmp.setValue(stTmp.getLine(1))
    stTmp.trim()
    if not stTmp.isAllDigits() then cycle.
    minRepeat = stTmp.getValue()

    stTmp.setValue(stTmp.getLine(2))
    stTmp.trim()
    if not stTmp.isAllDigits() then cycle.
    maxRepeat = stTmp.getValue()

    if minRepeat > maxRepeat then cycle.

    st.setLength(prevLen)  ! remove repeat group eg. {1,4}
    stTmp.setValue(st.slice(prevLen2+1)) ! tok to be repeated
    st.setLength(prevLen2) ! remove tok to be repeated

    loop y = 1 to maxRepeat
      st.append(stTmp)
      if y > minRepeat then st.append('?'). ! optional?
    end
  end 

  if st._dataEnd < 1
    return ''
  else
    return st.valuePtr[1 : st._dataEnd]
  end

EDIT 1: I see Alberto’s requirement/request was for {min,max} and so this is probably only part of the solution as it probably doesn’t cater for exact number such as {5}. I guess you could use {5,5} but that is not ideal. (Sorry I don’t have time to look further right now as got to get the house organised for extended family coming for Christmas!)

OK I have done enough chores for today and the rest can wait until tomorrow, so I have had a quick look and done minimal tweaks to this to allow for specifying an exact number of repeats as well as a range. Basically when an exact number is specified the min and max number of repeats are set to the same.

To use this with match() or strpos() commands (or st.FindMatch()) simply wrap the regex in this function.

so if you have

match(myString, myRegex, Match:Regular)

where myRegex is say ‘^[0-9]{8}$’ or ‘^{A{3}|B{2}}$’ (which were Mark’s two first examples) or something with a range of repetitions like ‘[0-9]{1,3}’ then instead say:

match(myString, ExpandRegex(myRegex), Match:Regular)

so without further ado:

prototype is:
 
ExpandRegEx       PROCEDURE(string pRegex),string

and the code is:

ExpandRegEx       PROCEDURE (string pRegex)
!-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
!
! ExpandRegEx - expand repetitions 
! eg. [0-9]{1,3} expands to [0-9][0-9]?[0-9]?
!     [0-9]{0,4} expands to [0-9]?[0-9]?[0-9]?[0-9]?
!     [0-9]{5}   expands to [0-9][0-9][0-9][0-9][0-9]
!
! Version 1.0 written on 15th March 2023 by Geoff Robinson vitesse AT gmail DOT com 
! Version 2.0 written on 23rd December 2024 again by Geoff Robinson
!           - minimal changes to now allow an exact number of repeats
!
! MIT License
!
! Copyright (c) 2023 Geoffrey C. Robinson
!
! Permission is hereby granted, free of charge, to any person obtaining a copy
! of this software and associated documentation files (the "Software"), to deal
! in the Software without restriction, including without limitation the rights
! to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
! copies of the Software, and to permit persons to whom the Software is
! furnished to do so, subject to the following conditions:
!
! The above copyright notice and this permission notice shall be included in all
! copies or substantial portions of the Software.
!
! THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
! IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
! FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
! AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
! LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
! OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
! SOFTWARE.
!
!-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
ExpandRegEx          PROCEDURE  (string pRegex)            ! Declare Procedure
st        StringTheory
ln        StringTheory     ! line
stTmp     StringTheory     ! temp worker

c         String(1),auto   ! char
b         byte, over(c)  

x         long,auto
y         long,auto

prevLen   long
prevLen2  long

minRepeat long,auto
maxRepeat long,auto
  code
  st.setValue(pRegex,st:clip)
  loop x = 1 to st._dataEnd
    c = st.valuePtr[x]
    case b
    of 91   ! val('[')
      y = st.matchBrackets('[',']',x)
      if y = x + 1 ! allow for where ] is included char
        y = st.findByte(93, y+1)  ! find next ']'
      end
    of 123  ! val('{{')
      y = st.matchBrackets('{{','}',x)
    else
      y = 0
    end
    if y
      st.addLine(st:end,st.valuePtr[x : y])
      x = y
    else
      st.addLine(st:end,c)
    end
  end 

  st.free()  
  loop x = 1 to st.records()
    prevLen2 = prevLen
    prevLen  = st._dataEnd
    ln.setValue(st.getLine(x))
    st.append(ln)
?   assert(ln._dataEnd > 0)
    if ln._dataEnd = 1        or |
       ln.valuePtr[1] <> '{{' or |
       ln.valuePtr[ln._dataEnd] <> '}'
      cycle
    end           
    stTmp.setValue(ln)
    stTmp.crop(2, stTmp._dataEnd - 1) ! strip { and }
    stTmp.split(',')

    stTmp.setValue(stTmp.getLine(1))
    stTmp.trim()
    if not stTmp.isAllDigits() then cycle.
    minRepeat = stTmp.getValue()
      
    case stTmp.records()
    of 1
      maxRepeat = minRepeat ! exact number of repeats eg. {5}
    of 2
      stTmp.setValue(stTmp.getLine(2))
      stTmp.trim()
      if not stTmp.isAllDigits() then cycle.
      maxRepeat = stTmp.getValue()
      if minRepeat > maxRepeat then cycle.
    else
      cycle
    end

    st.setLength(prevLen)  ! remove repeat group
    stTmp.setValue(st.slice(prevLen2+1)) ! tok to be repeated
    st.setLength(prevLen2) ! remove tok to be repeated

    loop y = 1 to maxRepeat
      st.append(stTmp)
      if y > minRepeat then st.append('?'). ! optional?
    end
  end 

  if st._dataEnd < 1
    return ''
  else
    return st.valuePtr[1 : st._dataEnd]
  end

Carl, the first digit is 0-8 (no 9s), also the first group can’t be 000 or 666. The 2nd and 3rd grouping can’t be all 0s.
This is what I use, but I use PCRE library and not MATCH. Honestly, MATCH kind of sucks, IMO.

'\b(?!000)(?!666)[0-8][0-9]{{2}[- ](?!00)[0-9]{{2}[- ](?!0000)[0-9]{{4}\b'

Yes my MATCH example is just the basic formatting of ###-##-####. It was to show how Clarion RegEx deals with repeat count, which it does not.

I’m familiar with the SSN rules. I support ITIN’s that always start with a 9, and maybe 8 but I think that changed. After the Basic numbers Match then I would show warnings for invalid segments.

I want to allow 666 and Zero segment numbers for testing, with a warning. The new IRS IRIS CSV upload accepts 666 numbers without any complaint, which is great for testing.

This one here?

https://lists.exim.org/lurker/message/20150105.162835.0666407a.en.html

StrPos and Match is challenging, but Ive worked how to replicate some of the extra MS regex functionality in clarion.

Strpos and match just needs to be built upon, just like I have a different way of doing the {n} repetition to what Vitesse has posted. Time is the factor I dont have enough of though in a spinning plates scenerio.

1 Like