Table Manager: A class to read tables with less code while optimizing for SQL backends

Tags: #<Tag:0x00007fc0d25055b0> #<Tag:0x00007fc0d25054e8> #<Tag:0x00007fc0d2505420>

Many Clarion applications where developed decades ago, and have lots of code like this (comments show what can be improved):

CLEAR(ORD:Record) 	                              !1 OK
ORD:OrderDate = DATE(10,12,1996)                  !2 It should be together with line 7
SET(ORD:KeyOrderDate,ORD:KeyOrderDate)            !3 No need to write the key name twice
LOOP                                              !4 UNTIL NEXT would be better
  NEXT(Orders)                                    !5 NEXT could be smarter
  IF ERRORCODE() THEN BREAK.                      !6 If it's always there, it shouldn't be there
  IF ORD:OrderDate > DATE(10,28,1996) THEN BREAK. !7 See 2, it can't be optimized for a SQL backend
  IF ORD:ShipState <> 'FL' THEN CYCLE.            !8 It can't be optimized for a SQL backend
  CLEAR(DTL:Record)                               !9 OK
  DTL:CustOrderNumbers = ORD:CustOrderNumbers     !10 OK
  SET(DTL:KeyDetails,DTL:KeyDetails)              !11 See 3
  LOOP                                            !12 See 4
    NEXT(Detail)                                  !13 See 5
    IF ERRORCODE() THEN BREAK.                    !14 See 6
    IF DTL:CustOrderNumbers <> ORD:CustOrderNumbers THEN BREAK. !15 See 8, it's just the opposite of 10
    !Some code
  .
.

Using ABC classes, the loop can be changed to:

LOOP UNTIL Access:Orders.Next()
 ...

But this won’t handle ranges and filters.

A VIEW can be used to handle filters and ranges for both TPS and SQL, but you must declare the view structure, and it needs a string with code that won’t be verified by the compiler:

 OrdersView{PROP:Filter} = 'ORD:Shipstate = ''FL'''

For SQL tables, PROP:Where can be used before the first NEXT to send ranges and filters to the backend, but it also needs a string with code, and you must use the fields’ SQL names:

 Orders{PROP:Where} = 'SHIPSTATE = ''FL'''

Putting all this together, I developed a class that can be used to write the code above like this:

tm TableManager
CODE

tm.Init(ORD:Record)
tm.AddRange(ORD:OrderDate,DATE(10,12,1996),DATE(10,28,1996))
tm.AddFilter(ORD:ShipState,'FL')
tm.SET(ORD:KeyOrderDate)
LOOP UNTIL tm.NEXT(Orders)
  tm.Init(DTL:Record)
  tm.AddRange(DTL:CustOrderNumbers,ORD:CustOrderNumbers)
  tm.SET(DTL:KeyDetails)
  LOOP UNTIL tm.NEXT(Detail)
    !Some code
  .
.  

Now, all variable names used in ranges and filters are validated by the compiler, and are converted to a WHERE clause when using SQL tables.

This is the SQL code generated by Clarion without the class:

WHERE (ORDERDATE >= ? AND (ORDERDATE > ? OR (CUSTNUMBER >= ? AND (CUSTNUMBER > ? OR (ORDERNUMBER >= ? )))))

And this is what the class generates:

WHERE (ORDERDATE BETWEEN '19961012' AND '19961028' AND SHIPSTATE = 'FL')

Method tm.NEXT calls Clarion’s NEXT instruction and returns ERRORCODE() if an error is posted. With TPS tables it also skips records not matching the filters, and returns 1 (tm:Record:OutOfRange) at the end of the range. For SQL tables, all filters and ranges are handled by the back end.

Filters that need to evaluate more than a variable and a value, like this one:

 IF ORD:ShipZip = '33012' OR ORD:ShipZip = '33015' THEN CYCLE.

Can be coded as a logical expression using the class:

 tm.AddFilter('NOT ('&tm.V(ORD:ShipZip)&' = '&tm.S(33012)&' OR '&tm.V(ORD:ShipZip)&' = '&tm.S(33015)&' )')

Method tm.V() returns an internal pointer to the variable; there is no need to BIND it. Method tm.S() formats the value for string comparison (adds quotes). For dates and times tm.D() and tm.T() can be used to format values.

For TPS tables the class will generate and EVALUATE this string:

NOT ( '33064' = '33012' OR '33064' = '33015' )

Where 33064 is the value of field ORD:ShipZip in the record currently being processed.

For SQL, the expression will be added to WHERE using the fields’ SQL names:

AND NOT ( SHIPZIP = '33012' OR SHIPZIP = '33015' )

The class can also be used with queues. This code:

LOOP IQTest = 1 TO RECORDS(QTest)
  GET(QTest,IQTest)
  IF NOT INRANGE(QTest.Number,2000,4000) THEN CYCLE.
  !Some code
.

Can be changed to:

tm.AddFilter(QTest.Number,2000,4000)  
tm.SET(QTest)
LOOP UNTIL tm.NEXT(QTest)
  !Some code
.

In tests, the class has performed correctly. As expected, processing TPS tables and queues adds a few hundredths of second to the execution time compared to traditional Clarion code, but when using SQL tables, time was reduced by over 30% in a real-world data base.

Since the code modified to use the class is very similar to the original, an IDE AddIn or text editor macro can be developed to simplify migrating apps with lots of hand coded loops.

The class uses the TUFO interface, published by Oleg Rudenko and Mike Duglas.

Source code and a test application are available at GitHub.

Feedback is welcome.

7 Likes

Just looking at this portion of your post.
I haven’t looked at your class, so I’m just imagining what’s going on inside the class.

From what I’m seeing there is one TableManager instance for multiple tables
I imagine that your class must repeatedly check and align internal structures to the table being worked on? Wouldn’t it be more efficient to have an instance of TableManager for a single table, thereby eliminating the alignment work.

The first call we see into your class is tm.AddRange(ORD:OrderDate,DATE(10,12,1996),DATE(10,28,1996))
How do you associate that call with the Orders Table ?

You’re calling tm.AddRange(DTL:CustOrderNumbers,ORD:CustOrderNumbers) inside of the loop, but I don’t see anything to remove the previous Range.
Is that a bug or is there some logic that says that adding a range for the same field replaces the previous range?

tmDTL.File( Details ) 

tmORD.File( Orders )
tmORD.AddRange(ORD:OrderDate,DATE(10,12,1996),DATE(10,28,1996))
tmORD.AddFilter(ORD:ShipState,'FL')
tmORD.SET(ORD:KeyOrderDate, tm:ClearKeyForNext)

LOOP UNTIL tmORD.NEXT()
  tmDTL.ReplaceRange(DTL:CustOrderNumbers,ORD:CustOrderNumbers)
  tmDTL.SET(DTL:KeyDetails, tm:ClearKeyForNext)
  LOOP UNTIL tmDTIL.NEXT()
    !Some code
  END
END

tm.File - set a &File property, and clears out any existing Ranges or Filters

tm.ReplaceRange - clears out an existing range for the same field and then adds the new range

tm.Set( Key, Flags ) - the flags are used to prime the components of the key prior to the RLT SET(Key,Key) which relies on values in the table buffer. This method would use file structure properties to CLEAR( KeyComponent_N, +/- 1 ) based on Key{PROP:Ascending, N}

tm.Next() no longer takes a File parameter, as that’s now a property of the instance

Hi Mark

I understand your comments. It may look like there are missing pieces in the code but the class is designed to work without them. Let me explain a little the background and how it works.

I’m currently planning the migration of several accounting related applications. They were originally developed using Clarion Professional Developer for DOS in the early 90s. Each one has hundreds of hand coded reports and processes, with many loops like the example.

One of the goals for this class was that the finished code would be as close to the original as possible, that is, need as few text edits as possible.

Declaring an instance of the class for each table may have resulted in a simpler class, but when the programmer is working in the code, when a new table is referenced, he/she would have to scroll back to the data section, declare the table’s instance, scroll forward to the beginning of the process to initialize it, and then find the line of code he/she was working with. That seems like a lot of work to me (yes, I’m lazy).

The approach was to let the class do most of the “administrative” work at runtime using reflection, at the expense of a small impact on runtime performance but with large savings in coding time. It’s similar to how many libraries work in the .Net world.

With this approach the programmer just declares a tm instance at the beginning and keeps using it for any table in the code.

What you call align internal structures to the table being worked on is just a GET to an internal queue.

This is how it works:

tm.AddRange(ORD:OrderDate,DATE(10,12,1996),DATE(10,28,1996))

At this moment, the class doesn’t know yet what table it’s going to work with. AddRange/Add Filter just saves the Tufo information of the field (ORD:OrderDate) and the values to queue SELF.Conditions

tm.SET(ORD:KeyOrderDate)

Here is where most of the work happens:

  1. With the received key the class uses PROP:File and then PROP:Record to reflect the RECORD structure of the table
  2. Each of the table’s field’s Tufo information, and a reference to the table it belongs to, is added to the queue SELF.Fields.
  3. Adds (or gets if already there) the table to queue SELF.Tables. The key for this queue is INSTANCE(table,THREAD())
  4. Clears any records in SELF.Tables.Ranges and SELF.Tables.Filters
  5. Validates all fields in SELF.Conditions against SELF.Fields, gathering new info, like if the field is or belongs to a group.
  6. If all conditions match fields in the table, moves all records in SELF.Conditions to SELF.Tables.Ranges or SELF.Tables.Filters, otherwise raises an error.
  7. Calls Clarion’s SET()
  8. If the table is SQL, creates a WHERE expression and sends it using table{PROP:Where}

LOOP UNTIL tm.NEXT(Orders)

  1. Gets SELF.Tables
  2. Calls Clarion’s NEXT()
  3. If the table is SQL just returns ERRORCODE().
  4. If the table is TPS
    4.1 If there is an error, returns ERRORCODE().
    4.2 Evaluates each condition in SELF.Tables.Ranges, if any one fails returns tm:Record:OutOfRange
    4.3 Evaluates each condition in SELF.Tables.Filters, if any one fails it cycles an internal loop to get the next record.
    4.4 If all conditions are met, returns tm:Record:Ok (0).

tm.AddRange(DTL:CustOrderNumbers,ORD:CustOrderNumbers)

Adds this condition to SELF.Conditions (it was cleared in the previous SET).

tm.SET(DTL:KeyDetails)

Parses Detail, adds it to SELF.Tables, add its fields to SELF.Fields, and moves records from SELF.Conditions to SELF.Tables.Ranges

LOOP UNTIL tm.NEXT(Detail)

Same as before.

I don’t see anything to remove the previous Range. Is that a bug or is there some logic that says that adding a range for the same field replaces the previous range?

It asumes that all conditions (filters and ranges) are related to the table referenced in the next tm.SET() call. Method tm.SET() replaces table’s conditions with class’s conditions and then clears class’s conditions.

I hope this makes it clearer. I’ll be happy to answer any comment or concern.

Regards,

Carlos

Seems like the Class should have a Tm.Init() method to remove all filters, ranges, etc, to start a new loop using the same object. I think your .ClearConditions does that, but .Init is the typical name.

Looks pretty neat Carlos. Thanks for sharing it.
Having been bitten by the “non-specific return” problem more than once, I thought I’d open my big mouth about the methods with conditional RETURN statements.

If you have something like:
CASE X
OF 1
RETURN x
OF 2
RETURN y
OF 3
RETURN z
END
!What happens if you reach here for any unforeseen reason?
!Your calling procedure might (probably will) receive an invalid result, but not realize it.
So you should RETURN some kind of acceptable result (such as ‘’) as a fail safe. Otherwise you get “weird” stuff that is not easy to debug.

I wish there was a way for the compiler to warn us about this, but all I have is the memory of the bad experience.

1 Like

Hi Carl

Initially I thought it wasn’t needed because tm.SET would take care of that. But, thinking it over, it’s possible for a code path that results in calls to tm.AddRange() without a call to tm.SET between them, so I added tm.ClearConditions. It would safe-programming to call this before adding new conditions.

I thought about calling it .Init, but it seems to me that .Init is used as a pseudo-constructor needed because Clarion’s constructors can’t receive parameters. We should be able to write someref &= new SomeClass(SomeParameter), but we can’t yet. As the constructor of tm doesn’t need any parameter, I decided to name it something else.

I followed what StringTheory does. It doesn’ t have an .Init method, it has a .Start method instead. But to be fair, I did try to call st.Init a few times until a looked up in the documentation the correct name.

So, I will add an .Init method, and document the recommended use as:

CLEAR(ORD:Record)
tm.Init
tm.AddRange(ORD:OrderDate,DATE(10,12,1996),DATE(10,28,1996))
tm.AddFilter(ORD:ShipState,'FL')
tm.SET(ORD:KeyOrderDate)
...

Thanks for the feedback!

Hi Jeff

You are right. And the problem could be worse as the function may be assigned to a uninitialized ANY and GPF later.

I’ll add those fail-safe return values outside the CASE.

Thanks for the feedback!

You can write a constructor that takes arguments in CW
It’s not called .CONSTRUCT, but you can write a method that behaves as a constructor.

SomeClass.NewWithArgs PROCEDURE(LONG xArg1, STRING xArg2)!, *SomeClass
NewInstance &SomeClass
   CODE
   NewInstance &= NEW SomeClass
   NewInstance.Property1 = xArg1
   NewInstance.Property2 = xArg2
   RETURN NewInstance

   ! NOTICE that I do *NOT* use SELF. to touch properties in this method
   ! If would GPF if I did
   ! NOTE: I can use SELF.SomeMethod() here
   ! as long as that method does not touch properties 
   !    (or call other methods which touch properties)

! Then to use it

SomeProc PROCEDURE
MyClass    &SomeClass
   CODE
            ! vvvvvvv--- Yes MyClass is NULL at this point
   MyClass &= MyClass.NewWithArgs( 42, 'Hi There')     
   ! etc.
1 Like

I’ve just updated the source code in github.com. I added two new methods:

tm.Init

Clears SELF.Conditions

tm.Init(ORD:Record)

  1. Clears ORD:Record
  2. Clears SELF.Conditions
  3. If the table Orders (the table to which the record label belongs) has been used in this instance of tm, clears its SELF.Tables.Ranges and SELF.Tables.Filters.

This way, we can replace the line CLEAR(ORD:Record) in the legacy code with tm.Init(ORD:Record).

Also, I added RETURNs after some CASE structures, just in case.

Thanks @CarlBarnes and @jslarve!

2 Likes

Nice work Carlos!

where you have addRange or AddFilter with two value parameters

Using Table Manager , a SQL logical expression is created and sent to the backend:

WHERE (ORDERDATE >= '19961012' AND ORDERDATE <= '19961028' AND ....

for SQL expression consider using BETWEEN.

WHERE ORDERDATE BETWEEN ‘19961012’ AND ‘19961028’

I have not kept up to date but in the past on some SQL’s BETWEEN was faster so was preferred. Maybe that is no longer the case but I guess would not do any harm?

maybe someone with more current knowledge might chime in…

anyway looking at your code, perhaps in SQLCondition where you have:

  pCondition.FieldRef = pCondition.FirstValue
  where = where & CHOOSE(where <> '',' AND ','') & |
    CLIP(SELF.Tables.Groups.Fields.FieldSqlName)&' >= '&SELF.FormatField(SELF.Tables.Groups.Fields.FieldRef,SELF.Tables.Groups.Fields.FieldType)
  pCondition.FieldRef = pCondition.LastValue
  where = where & ' AND ' & |
    CLIP(SELF.Tables.Groups.Fields.FieldSqlName)&' <= '&SELF.FormatField(SELF.Tables.Groups.Fields.FieldRef,SELF.Tables.Groups.Fields.FieldType)

instead try:

  pCondition.FieldRef = pCondition.FirstValue
  where = where & CHOOSE(where <> '',' AND ','') & |
    CLIP(SELF.Tables.Groups.Fields.FieldSqlName)&' BETWEEN '&SELF.FormatField(SELF.Tables.Groups.Fields.FieldRef,SELF.Tables.Groups.Fields.FieldType)
  pCondition.FieldRef = pCondition.LastValue
  where = where & ' AND ' & SELF.FormatField(SELF.Tables.Groups.Fields.FieldRef,SELF.Tables.Groups.Fields.FieldType)

I’m worried by your use of instance (table, thread()) my concern is that you’re using a single instance of tablemanager for all threads. The reason I’m concerned is that you have internal queues but there are no synch objects to make them thread safe.

Hi Geoff

Thanks for the recommendation, done. Not sure if it will be faster, but the generated SQL looks better :slight_smile:.

Github updated.

1 Like

AFAIK, no, I’m not. The global declaration has the TYPE attribute, and I declare a local instance in the procedure’s (or routine’s) data section.

I’m using a single local instance of TableManager for all tables in a procedure or routine.

TableManager.inc

 TableManager        CLASS,MODULE('TableManager'),LINK('TableManager'),TYPE
 ...

TestApp.clw

 INCLUDE('TableManager.inc'),ONCE

TestsTableManager.clw

TestTableManagerTps       ROUTINE
  DATA
ordrecs LONG
detrecs LONG
total   DECIMAL(15,2)
tm  TableManager
  CODE  
  ...

Good job Carlos and thanks for sharing.
I imagine there is a lot of effort invested there. I wanted to provide some constructive criticism as feedback.

Your Class is involved in a sensitive area in any Clarion application, which is datacentric, requiring a lot of care to avoid unforeseen events, and the complexity of the issue makes it very difficult to foresee everything, or that a Clarion version change does not end up affecting something that worked, like for example one that was mentioned lately
Possible BIND bug in clarion 13630
something like that probably won’t affect the code it’s just for example. To facilitate maintainability I believe that where it can be used what Clarion provides should be used, for example I believe that:
SELF.Tables.IsSql = CHOOSE(INLIST(UPPER(pFile{PROP:Driver}),‘MSSQL’,‘ODBC’,‘SCALABLE’,‘SQLANYWHERE’,‘SQLITE’)<>0) it could just be:
SELF.Tables.IsSql = pFile{PROP:SQLDriver}
or was the intention to limit those 5 exclusively? I don’t know if this example could be extended to the CASE used to determine the FieldType and the NUMERIC ISSTRING ISGROUP functions

I understand the initial objective of migrating code from DOS without substantially modifying them, but each case would have to be analyzed, such as the one mentioned
LOOP UNTIL tm.NEXT (Orders)

LOOP UNTIL tm.NEXT (Detail)
Depending on the amount of Orders and Details to be processed, the speed improvement of a Clarion VIEW that internally does the JOIN may justify the substantial modification of the code, mainly in SQL but also not SQL. On the other hand, maybe the Class could be adapted to accept tm.NEXT(OrdersAndDetailsView)? but I understand that it would be complex and laborious and may require a design modification of the class since some reflections could go back to the FILE instead of the VIEW, and the effort may not be justified, you have probably already analyzed it.

In case of using the two nested loops and SQL databases, perhaps it could be optimized if the inner LOOP sent the filter in PROP:WHERE to the server with parameters instead of the fixed but different value each time, so that the query is always the same regardless of the value of the parameter, I understand that the server could have the query precompiled, where some mention that it is more optimal, although in some cases a new plan with new statistics is better, but I guess not for this child file. The problem is that this could only be done with the collaboration of Softvelocity to send parameters, although I honestly did not test if myFile{PROP:WHERE} = ‘myField = someVariable’ or ‘myField = parentField’ if Clarion sent the value as a parameter when being binded SQL and not as a fixed value.

Finally, something that may not be related to the Class but should be monitored in the migration, the use of PROP:WHERE and / or SET(KEY,KEY) in the same thread as a browse or FileDropDowns or FileDropCombos based on te same table was causing
Error Code 78 - Illegal Number of parameters. The last thing I remember was commented on in the Clarion11 group on 2020-11-02 in the topic ‘Re: Size not equal for key’ by Bjørn Øyvind Holm, also on the 6th in the topic ‘PTSS 43119’. You can view the PTSS log for more information and testing programs. They solved using ALIAS there.

Regards,
Federico

Hi Federico

Thank you, I appreciate it.

Yes, I believe the same thing. I was very reluctant to use the Tufo interface, but unfortunately after days of trying to make it work using what Clarion provides, I couldn’t.

What finally convinced me to use the Tufo interface was that Capesoft’s Nettalk uses it in their NetWSDL class.

Perfect, thanks a lot! I remembered that property exists but I just couldn’t find it in the help file. I’ll change the class to use it.

Edit: Done.

Agreed, speed gains of using a view or a joined query could be huge. The plan is to first migrate everything and then optimize where needed.

I’m just starting to think about it, but maybe the class could be enhanced to allow something like:

tm.SET(ORD:KeyOrderDate,DTL:KeyDetails,ORD:CustOrderNumbers,DTL:CustOrderNumbers)

or even:

tm.SET(ORD:KeyOrderDate,DTL:KeyDetails)

To generate code for a joined select query, and then use PROP:Sql and tm.NEXT to loop over the results, keeping the proper values in the files’ buffers. Not easy but not impossible.

Thanks, I’ll keep an eye on this.

I really appreciate you taking the time for this feedback, thanks again.

Regards,

Carlos

I had some issues with BETWEEN in MSSQL when I was working with that a few years back. I believe the issue was if it was a DATETIME with a time component. I was dealing with data that could be datetime stamped within a few milliseconds of midnight and that could cause inaccuracies when using BETWEEN. I found best to use > the day before the from date and < the day after the to date. I seem to recall using >= and <= also caused some inaccuracies.

Some real testing would be needed with datetimes very close to midnight (within 10-20ms either way) to determine what method provides correct results.

I don’t know if Carlos already targeted specifically DATETIME fields, as the project born as a migration from DOS, but in that case it would be better to do as Arnor suggest in the upper side of the range, to be non inclusive, so BETWEEN should be avoided there.

IF DateTimeField >= startDateTime and DateTimeField < endDate

i.e. for today invoices the range would be >= ‘2020-06-01’ and < ‘2020-06-02’

so the inclusiveness or not of the upper range would be a parameter, that wasn’t present for example on the viewmanagerclass addrange method, would be an improvement, a bit more complexity.

For this and other tips look here (on item 3.3) https://wiki.postgresql.org/wiki/Don%27t_Do_This

More comments and example app on the thread SQL ABC browse with complex filter 2020-03-20 comp.lang.clarion David Jung

This is what I understand:

If you need a range A to B for variable X; and A, B and X are all of the same data type (numeric, string, date, etc.); the result of the logical expression is exactly the same if you use:

X >= A AND X <= B

or if you use:

X BETWEEN A AND B

And very probably, for most modern SQL backends, the performance will be identical.

BUT, if A and B are type Date and X is type DateTime, neither will work, because it’s hard to express a time constant close to midnight without jumping to the next day (an approximation is 23:59:59.999999). I guess that’s why @ArnorBld had problem with times close by milliseconds to midnight when using BETWEEN.

For this case, the logical expression should be:

X >= A AND X < (B plus 1 day)

As noted by @FedericoNavarro. The second operator is <, not <=.

This works:

 '20210531' >= '20210501 AND 
 '20210531' <= '20210531' 

This fails:

 '20210531 14:25:58' >= '20210501 00:00:00' AND 
 '20210531 14:25:58' <= '20210531 00:00:00'

This works:

 '20210531 14:25:58' >= '20210501 00:00:00' AND 
 '20210531 14:25:58' <  '20210601 00:00:00'

This fails:

 '20210531 23:59:59.001' >= '20210501 00:00:00' AND 
 '20210531 23:59:59.001' <= '20210531 23:59:59'

This works:

 '20210531 23:59:59.001' >= '20210501 00:00:00' AND 
 '20210531 14:25:58.001' <  '20210601 00:00:00'

Like many discussions here, the topic title prompted me to think about a related Clarion situation regarding DCT tables, optimizing SQL, and less Clarion code.

While defining an SQL view as a table in the DCT is straight forward, on several occasions, I have had the need to browse data returned as a table from an SQL function. Being able to define the returned table in the DCT and then somehow declare a “substitute” FROM clause with parameters would be ideal for leveraging & optimizing code that is best done on the backend.

If I remember correctly it was, as Carlos says, accuracy around midnight and it seemed that it was up to about 20ms after midnight that could fail and end up on the day before.