StringTheory replaceBetween method problem after I upgraded from 3.54 to 3.80

I have this code to fill placeholders in the template text file processing one line at a time

    ThisLabelLine.SetValue(ASC:Tekst)
    ThisLabelLine.replaceBetween('"','"','#NRKARTY#','KP' & KAR:KodLokalizacji & '/' & KAR:NrKartyPojazdu)
    ThisLabelLine.replaceBetween('"','"','#DATAPRZYJECIA#', FORMAT(KAR:DataPrzyjecia,@D17))      
    ThisLabelLine.replaceBetween('"','"','#MARKAMODEL#', CLIP(MOD:MarkaModel))
    [...]    
    ASC:Tekst = ThisLabelLine.GetValue()

I’m looping through a template text file that looks like this ( these are commands for Eltron Zebra label pronter )

N
q750
Q380,20
I8,B,001
A750,20,1,1,2,2,N,"Karta Pojazdu nr:"
A710,20,1,1,9,9,N,"#NRKARTY#"
A610,20,1,1,4,4,N,"#MARKAMODEL#"
[...]    
P1

The output I was getting with 3.54

N
q750
Q380,20
I8,B,001
A750,20,1,1,2,2,N,"Karta Pojazdu nr:"
A710,20,1,1,9,9,N,"KP6/1267"
A610,20,1,1,4,4,N,"SMART FORTWO I 450"
[...]
A50,20,1,1,3,3,N,""
P1

Now with 3.80 I’m getting a lot of additional lines/special characters in the output and some lines are left blank - see attached text file OutputLabels.txt as I cannot paste these here. Full template file Template.txt also attached

Gone through change log StringTheory Complete Documentation but no trace of any changes in this method between
OutputLabels.txt (12.7 KB)
Template.txt (656 Bytes)
3.54 and 3.80.

Hi Greg,

It will probably be useful to see a small example program showing the effect. You’re using variables in the replacement (KAR:NrKartyPojazdu etc) and so those are going to appear in the output. If the output has changed then it’s possible the input has changed as well.

A small sample program, with some small sample data in those variables will likely help identify the source of the problem.

Cheers
Bruce

1 Like

FWIW I just did a test with some hardcoded replacement values and replaceBetween worked perfectly so I think there is probably something else at play here, especially as the first few lines (which don’t have any substitutions) also show the same behaviour.

cheers again

Geoff

#Edit 1 : if you haven’t already, do some debugging to check before and after the substitutions, something like this:

    ThisLabelLine.SetValue(ASC:Tekst)
    LineNumber += 1   ! defined as a LONG line counter
    ThisLabelLine.saveFile('.\before ' & format(LineNumber,@n07) & '.txt')
    ThisLabelLine.replaceBetween('"','"','#NRKARTY#','KP' & KAR:KodLokalizacji & '/' & KAR:NrKartyPojazdu)
    ThisLabelLine.replaceBetween('"','"','#DATAPRZYJECIA#', FORMAT(KAR:DataPrzyjecia,@D17))      
    ThisLabelLine.replaceBetween('"','"','#MARKAMODEL#', CLIP(MOD:MarkaModel))
    [...]    
    ThisLabelLine.saveFile('.\after ' & format(LineNumber,@n07) & '.txt')
    ASC:Tekst = ThisLabelLine.GetValue()

You could of course just use ThisLabelLine.trace() and use DebugView++ but by writing each line to a before and after file pair you can easily see what is happening with each line after the fact. Again let us know what you find. I would do this before you go to the trouble of providing an example program in case (as I suspect) the problem is not where you think it is…

#Edit 2 I think the only change to replaceBetween between 3.54 and 3.80 was on line 3902 of 3.80 which was an optimisation to use the new ReplaceSlice() method. I was originally going to get Greg to reinstate that line to the old code as it was in vers 3.54 to see if it made a difference, however looking more closely I see that line is only ever relevant when the “replaceAll” parameter is set on - which is NOT the case in Greg’s code.

Turns out the problem is not directly related to StringTheory although something must have changed there to trigger this behaviour. Possibly something related to Unicode support in ST. @vitesse , you mention changes in ReplaceSlice() method so possibly some low level character manipulation in stMemCpyLeft Procedure.

The problem is with some invisible characters stored in TPS table and passed to ST object. Clarion does not support UTF-8, although pretty much everything supports unicode and defaults to it these days. Clarion is an exceptional :wink: toool so it doesn’t. Not sure what happens when you copy UTF-8 encoded text into Clarion entry control, but all national characters with acute accents are correctly mapped to ANSI code page. Although it may be handled by Windows as I noticed I had to change “Language for non-unicode programs” setting in order to display these characters correctly.

My template text file was created using notepad by customer. When I open it in Notepad++ it recognises it as UTP-8 encoded. The output file created by Clarion is recognised as ANSI
Some data in the table (gearbox codes, engine codes, colour codes), is being copied and pasted by end users from various websites and I came across a problem before when exporting data from TPS file to MySQL table using ODBC driver. With some records I was getting “Error: Incorrect string value: ‘\x81 …’ for column”. To solve this issue I created a small function parsing the string and replacing this CHR(129) with CHR(32) - space. Later on I added few other characters to be removed. I was posting about it in this thread MySQL Record Insert Error,: Incorrect string value: '\x81

So before calling SetValue method I clean the string with

    ASC:Tekst = SrcCleanIllegalChars(ASC:Tekst)
    ThisLabelLine.SetValue(ASC:Tekst)

When the string is cleaned from these illegal characters replaceBetween method works as expected.

Here’s the code of my cleaning procedure

SrcCleanIllegalChars     PROCEDURE  (Tekst)                   

Count LONG
  CODE
        Count = 1
        LOOP LEN(CLIP(Tekst)) TIMES     
            IF INRANGE(VAL(Tekst[Count]),0,31) OR INLIST(VAL(Tekst[Count]),'127','129','131','136','144')
                Tekst[Count] = CHR(32)
            END
            Count += 1
        END        
        Tekst = CLIP(LEFT(Tekst))
        RETURN(Tekst)

So basicaly if one of the characters above is passed to ST object the replaceBetween method produces this result. I noticed that usually “illegal” character is at the beginning of the string, as users select and copy text from the websites and I was able to fix the problem manually by editing a faulty record and removing the first invisible character. Not sure if that matters.
Interestingly, when I tried to replace placeholders with values from my table using Clarion string slicing the output was correct, so probably Clarion is ignoring these characters, so they become a problem only if passed to something external like MySQL ODBC driver etc.

@Bruce
You may want to look how your ST code treats these characters above as this may be an issue for other users of ST and also SQL Driver Kit which may throw an error if end user pastes some text with these characters from the web. Most likely it would be CHR(129). If you were not able to reproduce the problem, note that the “Language for non-unicode programs” setting may also play a role there. My setting is “Polish”

Hi Greg

glad you have got it working - I have my doubts that the “illegal” characters should make any difference because, as far as ST is concerned in this case a character is just a character. If you could add that “before and after” saving to files and upload a couple of example file pairs, along with your exact code that was executed in between the before and after, then I would be keen to see if I can duplicate it - basically “innocent until proven guilty” :grinning_face:

regarding your SrcCleanIllegalChars, have a look at RemoveChars and RemoveCharRanges. eg. you could have simply said:

st.removeCharRanges(‘<0>-<31,127,129,131,136,144’,‘<32>’)

note the range from <0> to <31> and the replacement character <32>

if there is no replacement character provided then the specified chars are simply removed.

hth and cheers

Geoff R

Might be a problem if a character is multi-byte. I think stringtheory only sees a character as 1 byte.

Typical LOOP code would use TO.
The Tekst = CLIP(LEFT(Tekst)) will not really CLIP the return if its a STRING. Move that to the RETURN.

SrcCleanIllegalChars PROCEDURE(STRING Tekst) !,STRING                 
Count LONG
  CODE
    LOOP Count = 1 TO LEN(CLIP(Tekst))     
        IF VAL(Tekst[Count]) <= 31 |
        OR INLIST(VAL(Tekst[Count]),'127','129','131','136','144') THEN
              Tekst[Count] = '' ! was = CHR(32)
        END
    END 
    RETURN CLIP(LEFT(Tekst))

Greg, in order for you to understand this, you need to start being a lot more accurate with your terminology.

Firstly, Strings are collections of bytes, not characters.

Those collections can be organised via quite a few encodings.

Theres no way to look at a string to determine it’s encoding. But if you do know the encoding then conversion to other encodings is possible.

Most StringTheory methods assume the encoding is ASCII. Some of the methods work on ASCII, utf-8 or utf-16 encodings. Some methods convert encodings.

If you cut and paste text from one encoding into an input that assumes another encoding then you are going to get unwelcome effects.

Doing random character substitutions is not the solution. If you think the string contains the wrong encoding, then convert it (there are methods for that.)

Once you start thinking about this correctly, as a question of encoding, then you will more easily understand how best to approach the solution.