New Line in StrPos - is it ascii linefeed <10> or Carriage Return <13>?

RchdR · October 28, 2024, 3:17pm

I suppose I could work this out by testing for it, but in the help docs for strpos, with some of the regex operators, the reference to a ‘newline’ exists.

The period that matches any single character or the [^ ] complemented character set refer to ‘newline’.

What character code is ‘new line’?

Is it the ascii linefeed or carriage return character code or… something else?

Tia

urayoan · October 28, 2024, 4:43pm

I use <13,10>

Not sure if that’s what you looking for (sounds like a Jedi )

jslarve · October 28, 2024, 5:08pm

This seems like something you’d need to test in Clarion to be sure.

Maybe CRLF or LF are treated the same? No idea.

RchdR · October 28, 2024, 6:20pm

How does that work in strpos then, its 2 chars?

Jedi?

RchdR · October 28, 2024, 6:21pm

Thats what Im working on but also testing for chars not in use in the search string and regex in order to swap the space char with, if there are space chars present.

Using the param as the search string is out the question and replacing the space char with something not used is the only way to stop the string from being clipped unless…

I go down the route of rewriting the rtl, which is a path I dont want to go down.

Keyboard on this phone gets hacked on this website straight after switching it on, like wtf!?!

PaulAttryde · October 28, 2024, 8:44pm

It could be a 10, a 13 or a 10/13 or 13,/0 pair. Depends on what they were feeling when they wrote it
Just for giggles this is what the C6 help says. It’s the newest version I use …

strpos

RchdR · October 28, 2024, 10:42pm

Im just counting the instance of each char in the search string and will use a char thats not in the search string to swap with space.

Only got 1 to 255 ascii choices to choose from! Until unicode appears…

Need to brush up on shims…

vitesse · October 29, 2024, 12:31am

Hi Richard

I was just working on a solution based on another thread where you were having problems with strpos clipping spaces and I also am searching for unused characters starting at chr(255) and working down.

I have written a new function “StrPosAndLen” which returns the starting position as does strpos() but also has two parameters where it returns the minimum and maximum lengths.

It finds a replacement character for space (if possible) as you have mentioned.

Usually I use StringTheory for anything like this but I seem to recall you are not using ST so I have done it the old long-hand way, which means this is a bit longer than it needs to be.

Anyway this is probably a “work in progress” but it passes my limited testing -can you please test it in your environment and let me know if it works for you. If it doesn’t work in your tests can you please provide the failing tests for me to check.

Hope this helps & cheers

Geoff R

StrPosAndLen         PROCEDURE  (string pText,string pRegex,*long pMinLen,*long pMaxLen)

! (c) 2024 Geoff Robinson Vitessegr at gmail dot com  
! 29 October 2024
! released under the MIT License https://opensource.org/license/mit
!
! 1 November 2024 - move check to get start position to be AFTER space substitution
!                 - note: this code does not cope with ranges when replacing spaces!! 

x     long,auto
max   long,auto
b     byte
c     string(1),over(b) ! single char over b
stPos long              ! start position (return value)
dollarEnded string(size(pRegex)+1)                          
regex &string 

  CODE
  pMinLen = 0; pMaxLen = 0            ! initialise/clear
  if size(pText) = 0 or size(pRegex) = 0 then return 0.

  if instring(' ',pText) or instring(' ',pRegex) ! includes space so need to replace spaces if possible
    loop b = 255 to 1 by -1
      if instring(c,pText)            then cycle.
      if instring(c,pRegex)           then cycle.
      if instring(c,'^$.[]|{{}*+?\-') then cycle.   ! avoid special regex chars
      break
    end
  end

  if b   ! if we have a replacement char for space then do replacements 
    loop x = 1 to size(pText)
      if pText[x] = ' ' then pText[x] = c.
    end
    loop x = 1 to size(pRegex)
      if pRegex[x] = ' ' then pRegex[x] = c.
    end
  end

  stPos = strPos(pText, pRegex) ! get start position
  if ~stPos then return 0.      ! no match
  if stPos = size(pText)        ! single char match
    pMinLen = 1
    pMaxLen = 1
    return stPos
  end

  if pRegex[size(pRegex)] = '$' and sub(pRegex,size(pRegex)-1,1) <> '\' 
    regex &= pRegex
  else
    dollarEnded = pRegex & '$'
    regex &= dollarEnded
  end                         
 
  max = size(pText) - stPos     ! max increment size
  loop x = 0 to max
    if strPos(pText[stPos : stPos+x],regex)
       pMaxLen = x + 1
       if pMinLen = 0 then pMinLen = pMaxLen.  
    elsif pMaxLen
       break
    end
  end
  return stPos  ! return starting position
!----
#Edit1 added '-' in regex chars to avoid "if instring(c,'^$.[]|{{}*+?\-') then cycle."
#Edit2 moved check to get start position to be AFTER space substitution
#Edit3 needed to move code to append $ to regex to be AFTER check start pos

#Edit4 (7th November 2024) I’ve just posted a much better solution to this problem (fixing issues with trailing spaces WITHOUT trying to find a substitution character) at:

StrPos() String Clipping - How to Match a Trailing Space?

RchdR · October 29, 2024, 12:51am

Its more code than Ive written, but I aint copying off my phone and not plugging my laptop in either, so sorry, no can do.

vitesse · October 29, 2024, 12:53am

OK no worries, I was just trying to help you solve the problem you were having.

it may, at the least, give you an idea or three.

cheers

Geoff R

RchdR · October 29, 2024, 10:14am

I think my code is more open to change when I get onto unicode, mainly because thats a bit of a rabbit hole.

But thanks!

vitesse · October 29, 2024, 11:33am

No worries Richard. One’s own code is often better for yourself as it reflects the way you think and go about things. Someone else’s code sometimes looks foreign. TBH I didn’t realise you had sorted it out otherwise I probably wouldn’t have worried about writing it.

If you want to, feel free to post your code here for comparison. There are always many ways to do things and one can often learn and get ideas by studying other people’s approach (and code). [Even, or perhaps especially, if it appears “foreign”]

One thing I did think of later is that the replacement of spaces with another character could cause a problem where there is a range of characters specified. Two examples:

‘<32>-<47>’

if the space character is changed to a different char, say <255> then the range is changed (in this case it would be invalid).

‘<0>-<47>’

in this case a space is included in the range specified, but if all spaces in the text are altered to, again say <255>, then suddenly those “spaces” are now out of the specified range.

more thought required!

Cheers again.

RchdR · October 29, 2024, 12:09pm

Editing regex patterns is one of those things. Because Im looping through each ascii code counting how instances exist in the search string, as its entirely possible I may only be able to swap the space with a letter, I also have to look at whether strpos is doing a case senstive search or not, because if not case senstive, I cant use the lower/upper case letter thats not being used. So if not case senstive, then I have 255 ascii codes - 26, to play with. Havent even looked at the other letters in ascii with dots and squiggles on them to see if they are affected by case sensitive searchs.

I do wonder if the clipping of the search string is why Borland used the word Turbo back then knowing pcs were very slow back then and the split that saw some employees end up in the Clarion world… Or maybe they just got caught up in the 930 widow maker hype of the time? Who knows???

Trying to think of positive reasons why this clip exists. A positive slant is always good, a porsche slant nose is even better!

RchdR · October 29, 2024, 9:48pm

Who are or were ‘they’ anyway, or is it a shroud of mystery?

Anyway because I cant be sure how other peoples computers are going to behave, I put some code in to test what ascii codes can be used and the function has an optional bitmask that returns a list of ascii codes that cant be used along side this magical ‘newline’.

I think that will be useful for other coders and their end users otherwise strpos just remains like voodoo and then the field validation cant be ramped up a notch.

A demo app could be useful to see if any clarion devs see any differences to other dev machines, because the paranoid survive.

Its also a good built in unit test to have, testing for the ascii codes that can or cant be used.

vitesse · October 29, 2024, 10:28pm

and you also probably want to avoid the “special” regex chars. See in my example above I am avoiding these:

if instring(c,'^$.[]|{{}*+?\-') then cycle.   ! avoid special regex chars

RchdR · October 30, 2024, 12:52am

I have them as well, but Im cycling through the ascii codes, using strpos and some regex patterns to see what gets returned, to workout whats acceptable or not. Then depending on bitmask option it returns an errorcode or changes the regex string and search string. All to get around the string clipping, but as a param doesnt get clipped, if push comes to shove I could move the strpos line into its own procedure and use the parameters so they dont get clipped. And then just pass a searchstring that keeps having a char added to it.

Its interesting how the code forces us to work in a particular way.

But I am testing for all the ascii codes in case something crops up and trying to keep it open for unicode. Havent even tried putting some unicode through it yet to see if it can handle it. As unicode is just paired bytes, I think it could work.

Theres a hell of a lot of intelligence gone into this stuff, not my stuff, but the runtime, the obfuscated abc classes, strpos, parameter passing etc etc. The clarion stuff is not a small dev team, unless HFA but I only have to look back at what we got taught at school, it was nothing in comparison to what Ive learnt over the years, and even then I know theres tricks of the trade I still dont know.

The two year exam courses could be stripped back to a couple days of usable content, a week at most, the rest was just irrelevant filler, makes me look back at school and college as just holding patterns. Time wasters!

vitesse · October 30, 2024, 1:28am

Hi Richard, can you provide a stripped down minimal example that shows this? I mentioned previouly that I was sceptical but perhaps I am misunderstanding you.

I think that is a common misunderstanding originating from the obsolete “UCS-2” (which stands for 2-byte Universal Character Set).

Bruce did a ClarionLive presentation some time back that might be worth watching. The presentation starts about 17 minutes in:

seanh · October 30, 2024, 2:20am

Umm No. UTF8 can be 1 to many bytes for a single ‘character’. There can be layering. That clarionlive presentation by Bruce is a decent place to get an update on it all.

RchdR · October 30, 2024, 10:34am

I read something which explained the workings of a unicode std, it might have been MS, wiki or unicode themselves, but it explained the workings quite well.

Anyway it was just using paired bytes but now I see it can be more than two bytes and I cant find what it was I read, but that was a few years back as well.

RchdR · October 30, 2024, 10:36am

The steps is in one of my previous post that refers to 6 or 7 dynamic string in the assembler. Im not putting the laptop online, the tv played up yesterday when it was on youtube.

And the keyboard is now playing up on this phone!

Its no big deal, Ill just plod along at my own pace.

I dont take av stuff in especially sounds, I take it in from reading content. You wouldnt believe the tests the NHS subjected me to as a kid! Legal system should be destroyed for that alone!