How to verify if XML structure is correct in code

Hi all,

I need some help again please.

I have been going round and round the whole day but seems I have now overloaded my brain (again).
I ended up with lines and lines of code and IF and ELSE and ELSIF sections that make my head spin and still does not work.

I have the line in a StringTheory object that I get from the StringTheory lines queue after I have plit the file after load.

<MYREF>XYZ</MYREF>
I need to verify that the name in the opening tag is correct = MYREF

<MYREF/>
I need to verify that and pickup that there is no value and the name in the opening tag is correct = MYREF

<MYREF>XYZ</AMYREF>
I need to verify that the name in the closing tag matches the opening tag: MYREF <> AMYREF

If everything can be done with StringTheory that would be great.
In one block of code if possible.

Any help and ideas would be appreciated.

Regards
Johan de Klerk

EasyXml has some ways to perform a validation:

  • ValidateOnParse property (indicates whether the parser performs validation, and Load method returns false for invalid xml)
  • Validate method (performs runtime validation on the currently loaded XML document)
  • ValidateSchema method (performs a validation against a schema).

Hi Mike,

Thanks for the link.
I will take a look at it.

I have XFiles from CapeSoft for loading the XML file to my Queue for processing.
I just need to check that the tags are correct and valid before loading.

Regards
Johan de Klerk

hopefully Bruce will come to the rescue…AI can now read ST although we broke it down into readable parts before AI document storage got big enough…

Hi Johan

I can certainly help with writing ST code, but you mention:

just make sure you cannot already do what you want with XFiles. The reason I mention this is some years ago I wrote some ST code for someone and Bruce chimed in that you could do the same already in XFiles (or perhaps it was JFiles?) in just a few lines of code.

IOW don’t reinvent the wheel.

How do you know what the tags should be? Do you have some list or schema that you are validating against?

similar with the “empty” fields: I think you are saying you want to verify MYREF against some list or schema.

the third requirement that an opening tag matches a closing tag seems straightforward - you probably would use st.matchBrackets and recurse down the structure.

but, as I mentioned, first check if there is already an existing way to do what you are after.

cheers

Geoff R

Edit: I see you asked a similar question in February. Did the various answers there help?

This is an xFiles problem not a StringTheory problem.

Load the XML into a xFilesTree class. Then it’s trivial to walk the tree to determine what nodes are there and what they are named.

If you’re not sure what methods to call then join a ClarionLive Webinar on Wednesday and we can walk through exactly what you are trying to achieve, and what code would do that.

Hi Geoff,

Yes the ideas helped for my JSON file.

Regards
Johan de Klerk

Hi Geoff,

My problem is some of the customers do not match the tags correctly so I want to check this before processing the data in the XML file.
I know how many fields there should be and the tag names so I need to check that.
I need to check if it is correct and any fails:
<MYREF>XYZ</MYREF> ! CORRECT
or
<MYREF/> ! CORRECT
or
<XMYREF>XYZ</MYREF> ! FAIL - Opening tag does not match what it should be
or
<MYREF>XYZ</XMYREF> ! FAIL - Closing tag does not match Opening tag
or
<XMYREF>XYZ</XMYREF> ! FAIL - Opening tag does not match what it should be
and maybe any others that might be wrong

This is the piece of code I have that somewhat works but it can’t detect the closing tag with or without data:

STFL.LoadFile(Loc:FileToImport)
STFL1.SetValue(STFL.Between('<RECORD>','</RECORD>')) ! Only get the first record to check
STFL1.Split('<13,10>') ! Split and add to Lines Queue
STFL1.RemoveLines() ! Remove empty lines
IF STFL1.Records() <> 78 ! Check if the amount of fields is correct
    ! ERROR missing some Fields - Terminate - Notify customer
ELSE
    LOOP XXML# = 1 to STFL1.Records()
        STFL1.GetLine(XXML#)
        IF XXML# = 1
            IF ~STFL1.Instring('<MYREF')
                ! ERROR - Terminate - Notify customer
            END
        END
        IF XXML# = 2
            IF ~STFL1.Instring('<CARNO')
                ! ERROR - Terminate - Notify customer
            END
        END
        IF XXML# = 3
            IF ~STFL1.Instring('<DRIVER')
                ! ERROR - Terminate - Notify customer
            END
        END
    ! DO the rest of the fields
    END
END

Please don’t kill me for my way of coding. :anxious_face_with_sweat:

Regards
Johan de Klerk

I think the way I would tackle that problem would be:

  1. Build a queue that just has the tags in it. I think FindBetween in stringtheory would help with that.
  2. Keep looping through the queue, and if you find a sometag/ then delete it. If you find a sometag /sometag pair (one after the other) then delete both of them.
  3. Loop through the queue as many times as necessary (which will be based on the amount of XML tag nesting).
  4. If you end up with an empty queue, great, all the pairs match up or are marked as empty. If you have anything remaining in the queue those are your problems. You might want to store the character position of the tag in the original file so you can inform whoever of where the problem is.

I was worried after I wrote this that I had forgotten about tags of this type:
<Audit CreateDate="27 JUN 2023" CreateTime="10:19:41AM" CreateUser="Administrator" CreateVersionNumber="1" ModifiedDate="" ModifiedTime="10:19:41AM" ModifiedUser="Administrator"/>
which I suspect you don’t have. But for the “empty” tags you would only be checking for a ‘/’ at the end of the tag anyway, and that would also identify tags like the above (although they are actually full rather than empty).

Hello again Johan

Not withstanding Bruce’s comment that you can easily see what nodes are present with xFiles, I can see you are wanting to validate that your received XML files are in a certain strict format before loading with xFiles.

Whereas line breaks (that are not within an element) are generally not significant in XML - these and indenting are more for human readability - in your case you want each element to be on a separate line.

Whereas elements can usually be in any order, you want your elements to be in a specified order.

Hence you are splitting on CRLF and then checking there are exactly 78 lines corresponding to 78 fields.

You just need to be aware that you might reject perfectly valid XML where it doesn’t conform to your strict layout.

IOW what you are validating is your specific format - not XML in general. Jon’s idea of deleting tags in a queue is a good idea for more general XML.

One thing first - where you have code like:

 STFL1.GetLine(XXML#)
 IF XXML# = 1
     IF ~STFL1.Instring('<MYREF')
       ! ERROR - Terminate - Notify customer
     END
 END

you need to understand that doing STFL1.getline() returns a value but does NOT put it into the STTFL1 value so the following check for STFL1.Instring(‘<MYREF’) will not do what you think.

what you would need to do is either

  STFL1.setValueFromLine(XXML#) 

or you could do it “long hand”

  STFL1.setvalue(STFL1.getLine(XXML#))

some people prefer to have a separate ST object for the line values either for clarity or if they are going to then do a further split on the line:

  lne.setvalue(STFL1.getLine(XXML#))

anyway with that aside, given that you want a strict format with fields/elements in a certain order, perhaps first define the order in your data not in your code - this will make it much easier to maintain if new fields are added or your stict order is changed etc.

Elements STRING('MYREF,CARNO,DRIVER, ...and so on')

you can then put that in a ST object and split on the delimiter (comma in this case) to give a list and order of elements.

stElt StringTheory

code
  stElt.setValue(Elements)
  stElt.split(',',,,,st:clip,st:left)

remembering you have specified each element on a new line etc., you could then validate your input.

Note we will NOT do st.removeLines so that you can keep all lines so that you can report which line number an error is on.

elements STRING('MYREF,CARNO,DRIVER, ...and so on')
eltNum   long
elt      PString(256)
stElt    StringTheory
st       StringTheory
x        long,auto
inRec    long
errs     long

  code
  stElt.setValue(Elements)
  stElt.split(',',,,,st:clip,st:left)

  if ~st.LoadFile(Loc:FileToImport)
    message('Failed to load file ' & clip(Loc:FileToImport) & |
            '||Error: '& st.LastError ,'File not Loaded')
    ! exit or return etc
  end 
  st.split('<13,10>',,,,st:clip,st:left)
  loop x = 1 to st.records()
    st.setValueFromLine(x)
    if ~st.getValue() then cycle.  ! skip empty lines

    if ~inrec
      if st.getValue() = '<RECORD>'
        inRec = true
        eltNum = 0
      end
      cycle
    end
     
    if st.getValue() = '</RECORD>'
      inRec = false
      if eltNum <> stElt.records()
        errs += 1
        message('Was expecting ' & stElt.records() & |
                ' elements but got ' & eltNum, |
                'Error on line ' & x,Icon:Exclamation)
        ! you might want to BREAK to abandon further validation
      end
      cycle
    end
     
    eltNum += 1
    if eltNum > stElt.records()
      errs += 1  
      message('Was expecting ' & stElt.records() & |
              ' elements but found element# ' & eltNum & |
              '||' & st.getValue(), |
              'Error on line ' & x,Icon:Exclamation)
      cycle  ! or you might want to BREAK to abandon further validation 
    end

    elt = stElt.getLine(eltNum)
    if ~st.startsWith('<<' & elt)
      errs += 1  
      message('Was expecting ' & elt & ' element but got:||' & |
              st.getValue(), |
              'Error on line ' & x,Icon:Exclamation)
      cycle  ! or you might want to BREAK to abandon further validation 
    end
 
    if st.getValue() = '<<' & elt & '/>' or |
       (st.startsWith('<<' & elt & '>') and st.endsWith('<</' & elt & '>'))
      ! is valid
    else
      errs += 1
      message('Was expecting either a value like||<<' & elt & '>' & |
              'some value<</' & elt & '>||' & |
              'or a null/empty value <<' & elt & '/>||' & |
              'but got:||' & st.getValue(),|
              'Error on line ' & x,Icon:Exclamation)
      ! you might want to BREAK to abandon further validation
    end
  end ! loop   

  if ~errs
    if ~inrec
      message('File is valid')
    else
      message('Missing final </RECORD> in file')
    end
  end

Anyway this should point you in the general direction and you can adapt this to your exact requirements.

I have put error messages in the code but perhaps you might choose to log these to a file etc. If you call a centalised error procedure you could put the messages or logging there and have a switch to swap between them.

One thing you will want to consider is if you want to abandon the validation once one error is found or keep going. I have indicated where you can BREAK out of the validation loop once an error is found (ie wherever an error is encountered).

I should add that this code is not tested in any way so caveat emptor and all that - but see how you go.

cheers

Geoff R

Edit 1 - check at end that not inrec (missing '</RECORD/>')
Edit 2 - split messages over multiple lines to try to remove horizontal scrolling
1 Like

Hi Jon,

Thanks for you reply.
Will take a look at your suggestions.

Regards
Johan de Klerk

Hi Geoff,

Thanks for all your suggestions and advice.
That is why I call you the StringTheory GURU.

You are correct.
I want to be very strict how the file should be formatted and the fields that should be in it.
If it is not perfect I want to reject the whole file and not even start the loading.

Thanks for the sample code.
Your advice/suggestions is going to make my code much more readable and shorter.

Regards
Johan de Klerk

By loading it into the xFilesTree, you can easily inpect the tree before loading the tree into the queue or database.

Forgive me for being harsh, but parsing it with StrinTheory is the exact wrong tool for the job.