StrPos() String Clipping - How to Match a Trailing Space?

So still trying to nail down why strpos was returning trailing spaces in the result and I see in the c60runx.dll:cla$regular there are some 6dynstr calls with a couple that have the word clip in them. *

String = '2<32><32><32><32>0'
Regex = '^[1-9]$'

If I step through the string 1 char at a time so its len goes from 1 to 6, strpos will match and return 1 until the 6th char. This is wrong because it should only match the first character as chars 2-6 are not a digit in the range 1 to 9.

So is it clipping the string and if so anyway to turn this off?

Tia

** at this point I have not delved into the memory locations in the stack and followed the assembler but if I must I must but would rather not.

Its string clipping.
In the debugger, set a breakpoint on strpos, then call up the dissembley window, assembler should appear in bottom left quadrant of the debugger.
Locate

>>>>>> C60RUNX.dll:Cla$Regular:

Then using the outer or far right scroll bar because there is two, scroll down until you see

Call near C60RUNX.dll:_7RegExpr__find@FPcUiUiUi

Set a breakpoint on this line.

Then step through the search string char by char and you will see in the stacks EBX register, right mouse click on EBX, examine memory pointed to by register, the string growing with each char, but in this example, its always
32h00h
then
32h00h20h
32h00h20h20h
32h00h20h20h20h
Etc
And then only when it hits the second character ie 0 in the above string, does it change to
32h20h20h20h20h20h30h
in the ebx register.

So tldr, it is string clipping.

Now I have to think whether I need to automatically or via a new bitmask, convert space (20h) to something like an underscore character to prevent the built in string clipping.

Is there anyway of editting the memory address contents direct?
I wanted to change the null to a space and see if I would get a different result with the strpos result.

Tia

Edit.

So looking at my regexs to process the lines in clw’s and txa’s Im testing for the linefeed and continuation symbols at the end of the line which is why Ive not hit this problem before I think. I would still need to check those lines in the assembler and ebx register though to be doubly sure.

Hi Richard, I think we determined there was a problem with trailing spaces being ignored a couple of years ago on this thread:

and specifically I gave some code to work around this on this post:

By the sound of this latest post above, that earlier effort must not have solved your problem?? If that’s the case then I am happy to have another look if you can please clearly state the problem and why that earlier StrPosLen function doesn’t solve it. Thanks.

cheers again

Geoff R

edit1: for clarity, this is the function I was referring to:

prototype :

StrPosLen Procedure(string pText, string pRegex, bool pExcludeTrailingSpaces=true),LONG !return the maximum matching string length

code:

StrPosLen  PROCEDURE (string pText,string pRegex,bool pExcludeTrailingSpaces) 
x        long,auto
max      long,auto
len      long
stPos    long
regex    &string

  CODE
  !if ~address(pText) then return 0. ! uncomment this line if you decide to pass pText by reference instead of value
  if size(ptext) = 0 or size(pRegex) = 0 then return 0.
  stPos = strPos(pText, pRegex) ! get start position
  if ~stPos then return 0. ! no match
  if stPos = size(pText) then return 1. ! match on last char
                                   
  if pRegex[size(pRegex)] = '$'
    regex &= pRegex  ! point at passed regex
  else
    regex &= new String(size(pRegex)+1)
    regex = pRegex & '$'
  end

  max = size(pText) - stPos ! max increment size
  loop x = 0 to max
    if strPos(pText[stPos : stPos+x],regex) 
       len = x + 1
    elsif len
       break
    end
  end
  if address(regex) <> address(pRegex) then dispose(regex).
  if len and pExcludeTrailingSpaces
    len = len(clip(pText[stPos : stPos+len-1]))
  end
  return len

Strpos is string clipping, so its not using directly the string that its given to run a regex match on, its creating its own string which happens to be clipped of trailing spaces.

Ive set my function up to automatically swap the spaces for underscores (5f) and the regex patterns Im using to match are working. Add bonus is I can search the regex for any spaces and change them to underscores.
Now the underscore isnt hardcoded, its a default character if the coder doesnt pass an alternative replacement character like a fullstop, hyphen or something else. This way there shouldnt be a clash where the ‘Space Replacement Character’ aka SRC is already in use elsewhere in the regex and the search string. If the SRC doesnt exist already, it can be used as a SRC.

I forgot that conversation from a few years back, too many long days and nights, and a computer OS thats like whack a mole.

I noticed you are using a string whereas I am using a newed cstring so just changed it to a new’ed string and its still clipping the string. So the only other thing which is different is your code is working directly with the parameter where as I am new’ing the strings. Anyway, just swapped strpos to use the parameter and its not dynamically creating the string, so the use of the new’ed cstring or string is whats causing strpos to clip.

Next question is, any other functions being clipped when using new’ed strings or cstring? It might explain a problem with instring a few years back which drove me to look at strpos ironically!

So any variable thats not a passed parameter incurs the clipping of trailing spaces by strpos in the runtime. It doesnt matter if the variable is declared in the data section with its char length or as a reference to be new’ed later on in code, strpos will clip.

Havent tried a global cstring yet.

I guess the forced used of a parameter in strpos means the runtime wont change it, basically because its a parameter.

Edit.

You know the thing bugging me, is why hasnt anyone else reported this. Even instring doesnt work as expected! Maybe the c6 copy andy sent in the post was intercepted and the runtime altered because my apps report unexpected compiler flags. And as Ive said before, when is a bug not a backdoor?

Hi Richard

just for anyone following along at home, I have written an enhanced strPos function that returns the minimum and maximum lengths as well as the usual start position:

It is currently a “work in progress” in that it is not extensively tested, but seems to work on my tests at home.

Richard’s idea of replacing the spaces seems like a good idea to me due to the clipping Richard mentioned. The code in the above link tries to find an unused replacement character for spaces.

I’m a bit skeptical/sceptical about that making a difference. I suspect you made some other code change that altered things - but happy to be surprised and shown I am wrong if you can provide an example demonstrating this.

well instring() works fine for me. There are a couple of “tricks” - the default skip amount is the length of the substring being searched for, whereas usually you want it to be 1. And also if you pass your substring in a string field, you may need to clip it if you don’t want to include the trailing spaces in your search.

I did find a bug in instring() around the turn of the century, where the substring was more than 255 chars, but that was corrected years back and I am not aware of any other bugs in it.

I’m with Geoff on this one, I’ve never known Instring() to not work as it should. The defaults are a bit daft and that occasionally bites, but it always works.

Its over 4yrs ago now before I went down the strpos route because instring wasnt working probably, but even adding the step amount and startpos still didnt make it work.

Tldr is, is I just couldnt get a substring match out of instring, so I needed a new way to match substrings hence my journey into strpos.

I even toggled between aslr, core protection and other windows security settings in case, aslr especially, was affecting any of it, but that all drew blanks.

I’ll post this for those that want to use STRPOS()…

STRPOS() does clip trailing spaces. The purpose of STRPOS is to match Regular Expressions. To make it work you have to think in Reg Ex. What Reg Ex will match a Space?

I can think of two: a Character Set [<32>] or a Group {{<32>}

Example to find the word “the ” with a trailing space and not the word “they” is below:

Str1 STRING('are they in the dark  <32>')    
    CODE   ! 123456789012   6
    X1 = StrPos(Str1,' the[ ]')   !X1=12 not 4
    X2 = StrPos(Str1,' the{{ }')  !X2=12 not 4

String1 also has trailing spaces trimmed so if you want to match spaces you’ll need to append a non-blank e.g. below I picked a Tilde (~) to match “dark” plus a space at the end.

Str1 STRING('are they in the dark  <32>')    
    CODE   ! 123456789012   6
    X1 = StrPos(Str1 &'<32>~' ,' dark[ ]')   !X1=16

STRPOS() is different than INSTRING() which leaves the trailing spaces. Often this can result in bugs with spaces when forgetting to CLIP() with INSTRING. In the same way STRPOS requires a little extra thought with trailing spaces.

1 Like

Message() code you can use to test STRPOS() and Trailing Spaces. It also verfies INSTRING is not trimmed.

StrPosTrailSpaceRtn ROUTINE
    DATA   ! 1234                4 is wrong
Txt1 STRING('are they in the dark  <32>')    
    CODE   ! 123456789012   6
    Message('StrPos() deal with Trailing Spaces Clipped' & |
        '||Txt1="'& Txt1 &'"  (note trailing space)' & |
       '||InString(" the ",Txt1,1) ='   & InString(' the ' ,Txt1,1) &' <9>Instring w/ Trailing Space OK' &|
        '|InString(" dark ",Txt1,1) ='  & InString(' dark ',Txt1,1) &' <9>Instring w/ Trailing Space OK' &|
       '||1 StrPos(Txt1, " the") ='     & StrPos(Txt1,' the')      &' <9>Fail - No Trailing space in RegEx' &|
        '|2 StrPos(Txt1, " the ") ='    & StrPos(Txt1,' the ')     &' <9>Fail - Trailing space " " Ignored' &|
        '|3 StrPos(Txt1, " the[ ]") ='  & StrPos(Txt1,' the[ ]')   &' <9>[ ] Character Set works' &|
        '|4 StrPos(Txt1, " the{{ }") =' & StrPos(Txt1,' the{{ }')  &' <9>{{ } Group Space(s) works' &|
 '||5 StrPos(Txt1 is clip, " dark[ ]") =' & StrPos(Txt1     ,' dark[ ]')  &' <9>Fails Txt1 is Clipped' &|
  '|6 StrPos(Txt1 &<<0>, " dark[ ]") =' & StrPos(Txt1&'<0>' ,' dark[ ]')  &' <9>Txt1 & 32,0 Works' &|
        '|','StrPos Trail Spaces',Icon:Clarion,'Close',,MSGMODE:CANCOPY)

__
StrPos Trail Spaces Message Text

StrPos() deal with Trailing Spaces Clipped

Txt1="are they in the dark   "  (note trailing space)

InString(" the ",Txt1,1) =12 	Instring w/ Trailing Space OK
InString(" dark ",Txt1,1) =16 	Instring w/ Trailing Space OK

1 StrPos(Txt1, " the") =4 	Fail - No Trailing space in RegEx
2 StrPos(Txt1, " the ") =4 	Fail - Trailing space " " Ignored
3 StrPos(Txt1, " the[ ]") =12 	[ ] Character Set works
4 StrPos(Txt1, " the{ }") =12 	{ } Group Space(s) works

5 StrPos(Txt1 is clip, " dark[ ]") =0 	Fails Txt1 is Clipped
6 StrPos(Txt1 &<0>, " dark[ ]") =16 	Txt1 & 32,0 Works

Following on from Carl’s post I have found that adding a null (‘<0>’) on the end of a string works on both the data and the regex.

You do have to be careful with the nulls as when used with strpos they will terminate the string so they are no good anywhere other than at the very end of a string.

What I have done is written a wrapper for strPos called strPosUnclipped:

StrPosUnclipped      PROCEDURE  (*string pText,*string pRegex) !,LONG

  CODE
  return strpos(choose(size(pText) >0 and pText [size(pText)] =' ',pText &'<0>',pText), |
                choose(size(pRegex)>0 and pRegex[size(pRegex)]=' ',pRegex&'<0>',pRegex) )

If there are no spaces on the end it will call strPos() as normal, but where there are one or more spaces on the end it will append a null ‘<0>’ character to the string passed to strPos().

often when testing you want to pass by value so you can pass literals. As I am working in the IDE I cannot readily use overloading so I have written StrPosUnclippedByVal which simply accepts value parameters for the text and regex and calls the version with parameters passed by reference. Obviously you want to avoid using this version except where you want to use literals rather than a field.

StrPosUnclippedByVal PROCEDURE  (string pText,string pRegex) !,LONG
  CODE
  return strposUnclipped(pText,pRegex)

so now with this new StrPosUnclipped function, we can implement StrPosAndLen without all the jumping through hoops trying to do character substitution:

StrPosAndLen         PROCEDURE  (*string pText,*string pRegex,*long pMinLen,*long pMaxLen)
! returns position and minimum and maximum length of string matched against regex
x     long,auto
max   long,auto
stPos long              ! start position (return value)
dollarEnded string(size(pRegex)+1)                          
regex &string 

  CODE
  pMinLen = 0; pMaxLen = 0            ! initialise/clear
  if size(pText) = 0 or size(pRegex) = 0 then return 0.

  stPos = strPosUnclipped(pText, pRegex)           ! get start position
  if ~stPos or stPos > size(pText) then return 0.  ! no match
  if stPos = size(pText)        ! single char match
    pMinLen = 1
    pMaxLen = 1
    return stPos
  end
 
  if pRegex[size(pRegex)] = '$' and sub(pRegex,size(pRegex)-1,1) <> '\' 
    regex &= pRegex
  else
    dollarEnded = pRegex & '$'
    regex &= dollarEnded
  end

  max = size(pText) - stPos     ! max increment size
  loop x = 0 to max 
    if strPosUnclipped(pText[stPos : stPos+x],regex)  
       pMaxLen = x + 1
       if pMinLen = 0 then pMinLen = pMaxLen. 
    elsif pMaxLen
       break
    end
  end
  return stPos  ! return starting position

Note the minimum and maximum lengths are in the passed parameters (pMinLen and pMaxLen) while the start position is the returned value (same as with strPos).

The maxLen is handy as often you want the “leftmost longest” match and this code using strPosUnclipped() is much better (and simpler) than earlier versions trying to use a substitute character for spaces. So kudos to Carl for suggesting adding null on the end of a string.

Final comment: pText and pRegex are passed here by reference (for speed). As with StrPosUnclippedByVal() we can here do a wrapper for easy testing or use where we want to pass literals:

StrPosAndLenByVal    PROCEDURE  (string pText,string pRegex,*long pMinLen,*long pMaxLen)
  CODE
  return StrPosAndLen(pText,pRegex, pMinLen, pMaxLen)