Will Logout/Commit on loop updating records speed up the process?

I have a loop over a table that is updating records. Unfortunate I cannot do that in a PROP:SQL statement because each record can have different values.

It is my understanding that wrapping such a loop in a Logout/Commit pair can speed up things. So I tested this and it looks there is a little improvement in speed. Before I am going to deep to try this: Is my assumption correct that Logout/Commit will improve speed? And are compelling reasons not to use this trick?

The code I have now is:

LOGOUT(1,Lines) 
IF ERRORCODE() THEN Message('Error with logout in Lines. Error: ' & ERRORCODE() & ': ' & ERROR() & ' - State ' & FILEERRORCODE() & ': ' & FILEERROR()) . 

BUFFER(Lines,200)
SET(LIN:Header_Volgorde_fkey)
Lines{PROP:Where} = 'HeaderID = ' HEA:HeaderID
LOOP
  NEXT(Lines)
  IF ERRORCODE() THEN BREAK .
  IF LIN:HeaderID <> HEA:HeaderID THEN BREAK .      ! Not really needed, just for being sure.

  DO Vul_Norm
  IF ~HEA:WerknemerID THEN LIN:Uitvoering = 0 .
  IF ~NumInspections THEN LIN:Inspectie = 0 .
  IF ~HEA:WerknemerID OR ~NumInspections THEN PUT(Lines) .
END
COMMIT   

Is this the correct way to do it or should I place the Logout statement after the SET?

It will vary a lot by database. For file-based systems, like TPS and even SQLite the answer is yes. (There are other approaches to a SQLite, like WAL mode as well.)

In MSSQL (which is what you are using) there is less of gain. There is some (it makes each write marginally simpler) but over all its not the bottleneck.

As in all your other threads, I’ll mention that the new driver offers some techniques to make it faster [1], but ultimately there are limits that can be achieved when you need to update each record individually..

With the traditional driver, your best bet is to use a VIEW loop, GETting as few fields as possible, then doing a PUT on the View. The PUT is already minimised, so minimizing the GET would be good.

[1] the new drivers support prop:project at the file level, thus improving the read speed. They also support doing a PUT -without- a read first, but thats more useful for imports than for recalcs.

Not sure what you mean by “Unfortunate I cannot do that in a PROP:SQL statement because each record can have different values.”

What you have there (and of course I don’t know what is in your Vul_Norm routine), I would probably write like:

merge into lines t USING
  (select header_id
  , case when coalesce(werknemerID,0) = 0 then 0 else uitvotering end uitvotering
  , case when coalesce(NumInspections,) then 0 else inspectie end insepctie
  from header
  where coalesce(werknemerID,0) = 0 or coalesce(numinspections,0) = 0
) q
on (t.header_id = q.headerID)
when matched then update set uitvotering = q.uitvotering, inspectie = q.inspectie;

I’m assuming that Vul_norm does something to calculate or retrieve NumInspections, so that would have to go into the query inside the merge.

That query would update all lines in the file, so you might be wanting to limit to a particular headerID or set of headerIDs.

1 Like

My general principal is that if you can write it in Clarion you can probably write it better in SQL. I suspect that whatever you are doing in Vul_Norm can also be translated to SQL?

This is what ChatGPT did with your code…

Below is a clean, direct T-SQL translation of your Clarion routine as a stored procedure that receives @HeaderID as a parameter.
I’ll first show the procedure, then explain how each Clarion construct maps to SQL so you can validate the logic.


T-SQL Stored Procedure

CREATE PROCEDURE dbo.UpdateLinesByHeader
    @HeaderID INT
AS
BEGIN
    SET NOCOUNT ON;

    BEGIN TRY
        BEGIN TRAN;

        /* 
           Equivalent of:
           SET(LIN:Header_Volgorde_fkey)
           Lines{PROP:Where} = 'HeaderID = HEA:HeaderID'
           LOOP / NEXT(Lines)
        */

        UPDATE L
        SET
            -- Result of DO Vul_Norm
            -- Replace this with the actual logic performed by Vul_Norm
            L.Norm = L.Norm,  -- placeholder

            -- IF ~HEA:WerknemerID THEN LIN:Uitvoering = 0
            L.Uitvoering =
                CASE
                    WHEN H.WerknemerID IS NULL OR H.WerknemerID = 0 THEN 0
                    ELSE L.Uitvoering
                END,

            -- IF ~NumInspections THEN LIN:Inspectie = 0
            L.Inspectie =
                CASE
                    WHEN H.NumInspections IS NULL OR H.NumInspections = 0 THEN 0
                    ELSE L.Inspectie
                END
        FROM Lines L
        INNER JOIN Header H
            ON H.HeaderID = L.HeaderID
        WHERE L.HeaderID = @HeaderID
          AND (
                H.WerknemerID IS NULL OR H.WerknemerID = 0
             OR H.NumInspections IS NULL OR H.NumInspections = 0
          );

        COMMIT TRAN;
    END TRY
    BEGIN CATCH
        IF @@TRANCOUNT > 0
            ROLLBACK TRAN;

        THROW;
    END CATCH
END;
GO

Clarion → T-SQL Mapping

1. LOGOUT(1,Lines)

Purpose: Acquire exclusive access.

T-SQL equivalent:
Handled implicitly by BEGIN TRAN + row-level locking during UPDATE.


2. BUFFER(Lines,200)

Clarion buffer prefetching — not required in SQL.


3. SET(LIN:Header_Volgorde_fkey)

Clarion index selection for ordered reads.

SQL equivalent:
Not needed for UPDATE. SQL Server chooses the optimal index automatically.


4. Lines{PROP:Where} = 'HeaderID = HEA:HeaderID'

Row filtering.

SQL equivalent:

WHERE L.HeaderID = @HeaderID

5. LOOP / NEXT(Lines)

Row-by-row iteration.

SQL equivalent:
Converted to a set-based UPDATE, which is faster and safer.


6. DO Vul_Norm

Business logic routine.

SQL equivalent:
Must be inlined.
I left a placeholder:

L.Norm = L.Norm

Replace this with the real logic from Vul_Norm.


7. Conditional field resets

Clarion:

IF ~HEA:WerknemerID THEN LIN:Uitvoering = 0 .
IF ~NumInspections THEN LIN:Inspectie = 0 .
IF ~HEA:WerknemerID OR ~NumInspections THEN PUT(Lines) .

T-SQL:

CASE WHEN condition THEN 0 ELSE existing_value END

Only rows that need updating are touched.


8. COMMIT

Mapped directly to:

COMMIT TRAN;

With proper error handling via TRY / CATCH.

Since i first heard of SQL the popular advice is to move functionality into stored procedures to improve performance.

Over the past few years though I’ve had cause to dig deeply into SQL and have noticed a groundswell of opinion away from this. (Some links below).

Like with most things SQL, context matters. Ask 5 experts, get 10 opinions.

Personally I’m not a fan, mostly because it separates the “program” into 2 locations (in 2 different languages). This complicates installs, backups, version control, upgrades, debugging, logging, and so on. Affecting me less are issues of scalability and portability and long-term performance.

Im not saying Stored Procedures are wrong, I’m saying you should understand the implications of them before going down that road.

https://medium.com/binary-notes/why-using-stored-procedures-is-not-recommended-for-modern-applications-41a8b9c17ba4

1 Like

OK, I’ll bite…

I read the first and last of those things. Was OK with the last, but I’d have to say that he has more positives for stored procedures than negatives. The first one, has statements like "
They violate the separation of concerns - Your database should store and retrieve data.

That would be like me saying: a house is for sleeping in. So if you are cooking there, having parties there, or whatever, you are using the house wrong. Personally I’d say a database is for managing data. So if you are doing like getting data from some places in the database and putting it somewhere else in the database, probably a stored procedure is the right tool for the job. For example, I have a little program that we use to manage bus passes for our clients. Each week a procedure looks at the list of clients and the list of people with bus passes, and adds a bunch of tasks like cutting off people who are no longer our clients, adding those who are new clients, etc. Perfect type of thing for a stored procedure – no need to move a bunch of data over to a client program, have it do the manipulation on the client and then send the changes back to the database.

I’d also take issue with: “Just deeply procedural code”. In the past little while I have been looking at some of the procedures in the database I work with (but didn’t build). And yes, the code is deeply procedural. the typical pattern would be:

cursor x is somequery;
for rows in x loop
select a into v_a from sometable t where t.id = rows.some_id;
select b into v_b from someothertable t where t.id = rows.some_other_value;
…
insert into resulttable values(v_a,v_b…);
end loop;

So yes, horribly procedural. If it was properly written it would be more like:

insert into resulttable
select a,b…
from sometable
join someothertable on …
join somequery on …
;
which is probably a lot less procedural than anyone’s application code.

But when you are dealing with small things, like checking that all the ducks are in a row before you submit something complicated from the front end, like all the payees exist, and you know where to send the cheque, and there’s no hold on dealing with any of the payees, or whatever, certainly the front end can handle that sort of logic, although if it has to send a couple of dozen queries to the database to figure it out, maybe getting the database to do it would work better.

I understand that having code in two different places in different languages can be a bit of a challenge, but just the same as you wouldn’t build a fancy front end with a database tool, you probably shouldn’t be using languages that don’t handle sets to do large data manipulation operations.

2 Likes

Yeah, I’m aware of those arguments and have actually fallen victim to having too much code in the database, but I’ve also learned from it.

There are 3 ways to write code:

  1. Store components separately, where they should be stored, in a structured format that makes sense. e.g. UI, Data, Logic.

  2. All over the place like a mad woman’s breakfast.

  3. Some sort of logical combination of 1 & 2.

I think Clarion lends itself very well to type 3, but a lot of people prefer type 2.

An example of type 1 are the Clarion ABC Classes, it makes perfect sense to convert boilerplate code to a class that helps with DRY programming.

Type 2 code in Clarion is using inline Embeds for your code and repeating the same code in multiple procedures. etc etc.

My belief is that, if you have code that manipulates data in a batch and is causing delays to users, it makes sense to move that code to a stored procedure. Just be aware, from the beginning, that you could end up with tens, or hundreds of these things, so you should plan for maintainability. Back when I was doing this (at the turn of the century!) I wasn’t aware of techniques in MSSQL to structure this code logically, maybe now there are good ways to handle this number of procedures?

For organization, Oracle has packages. According to the AI summary, SSIS is sort of the equivalent in SQL. In the procedural code I have been exploring recently they didn’t use packages, and they also used copy and paste liberally, so they have about 10 times as much code as they should.

For example, in one place there are six “types” of requests that have to be processed to produce payments. From request to payment is the same process for each of these groups with three minor differences. Rather than have one procedure that can deal with those variations, they have six copies of the procedure, and each of those copies call their own version of subsidiary procedures (about five of them). So, 36 procedures where they should have 6.

1 Like

I was always doubting if it is wise to do an update when using a view. Apparently it is possible so I tried. Although the updating works (fields that are updated need to be in the view) it doesn’t increase the speed noticeable.

But good to know that updating records when using I view is possible. So I made the modification anyway just to learn new things.

I would prefer a PROP:SQL if possible but I think that is hard to make. The problem is that the variable NumInspections is set by the Vul_Norm routine and it can differ for each Line record. I don’t see how this SQL statement can do that. Or am I missing something?

Not really, that routine is working with a queue. I would like to have PROP:SQL statements that can do an update or insert more efficient yes, but moving my code to SQL functions is a step to far.

The loop I discussing here is just part of a whole source procedure where a lot is happening and I prefer to do that in Clarion to keep a bit oversight. On the other hand I have some SQL function running but that are solitary functions like anonymising users who are not active any more for over three years.

Absolutely, either a merge or an update can change values on each row to different numbers. In the reply below you say the Vul_Norm routine works off a queue. I’m assuming the values in the queue are pulled from the database, too? For ease of understanding you might put the vul_norm calculation into a CTE.
If you want to do it using prop:SQL without creating a procedure in the database, that’s not a problem. Obviously you can plonk my merge statement (including a vul_norm cte if that is the way you go) into a PROP:SQL, and purpleedge’s update statement likewise, which can live without all of the surrounding procedural begin transaction etc..
You might be looking at something more like this:

merge into lines t USING
  (with vul_norm as
      (select header_id,line_id,count(*) inspect_to_date
       from inspections i
       join lines l on i.idate <= l.dt
       )
select h.header_id,v.line_id
  , case when coalesce(werknemerID,0) = 0 then 0 else uitvotering end uitvotering
  , coalesce(inspect_to_date,0) inspectie
  from header h
  left join vul_norm v on v.header_id = h.header_id
  where coalesce(werknemerID,0) = 0 or coalesce(inspect_to_date,0) = 0
) q
on (t.header_id = q.headerID and t.line_id = q.line_id)
when matched then update set uitvotering = q.uitvotering, inspectie = q.inspectie;

Theoretically yes. It is build up from a table as a starter. But then the queue records can conditionally be altered by vales based on other tables and there is also random function that decides to use the queue record or not. So I think the real answer is no.

However I understand what you are telling about using a CTE in a MERGE Into. The key line is
coalesce(inspect_to_date,0) inspectie to set different values for each Line record. I don’t have used CTE’s and the MERGE into statement before so I learned something new. That concept of CTE I probably can use for some complex views I made on the database instead of using subqueries.

But for speeding this particular loop it is maybe a bit overdone because this is just one element in a rather complex source procedure. But I appreciate (as usual) your extensive answers.

Writing to a database always takes time, I’m not sure you’ll be able to speed things up substantially without moving the whole source procedure and functions to SQL as well. You might try using a profiler, like capesoft has, to find any bottlenecks which might improve things. But you still have the write to DB problem.

That is entirely possible, but as you say, and Bruce also mentioned, that comes at a cost as well.

It really comes down to how much you need to improve the speed vs the effort you can afford to put into it.

1 Like

Understanding database performance, and predictig what will make an impact, is tricky, because there are lots of moving parts. I’m guessing you tested on your developement machine, perhaps running the Database locally, or perhaps on the LAN. I’m guessing the database was doing very little at the time of the test (other than your test).

Of course conditions out in the field vary wildly, and each difference has an impact. The quality of the server machine (Ram CPU etc), the overall speed and conjestion of the network, the performance of the hard drives, the load being placed on the server, the number of client programs and so on all have an impact.

So there are two approaches to performance.

The first is figure out when and where it is slow, and spend time optimising that. This is an efficient use of your time because you are attacking meaningful issues. It can be a bit of wack-a-mole but over time, generally, the program will feel speedy. (And you’ll likely learn some new techniques along the way.)

The second approach is to code “efficiently”. Meaning that you write code which is theoretically performant, but which you may never see the result of. For example, the advantage of a VIEW is requesting fewer fields from the server. (The write back is equally efficient regardless of whether you are putting a FILE or VIEW so that’s not in play here.) The read effecieny will really only be obvious when the database resources are constrained, the database is under load, or the network is constrained.

Most programmers start out with approach 1, learning as they go. Over time they develop habits which bring more and more of approach 2 into play. approach 1 never goes away, but it becomes the exception not the rule.

Hence the advice for stored procedures. They take the network out of play. They take the client machine out of play, and move the code “close” to the data. In most places it’s good general advice. However (as mentioned above) there are some downsides. Indeed almost all optimisation comes with some sort of downside.

For example switching to a Clarion VIEW from a FILE loop comes at a cognitive cost for the programer. FILE loops are so easy that we just got used to them, and seldom moved to VIEWs. (The new drivers acknowledge this, and move all the VIEW efficiency back to the FILE so one can still write FILE loops, but at the same time be efficient.)

Ultimately though, customers will tell us where things are slow on their server, with their network, and then we can figure out new approaches in those specific situations.

1 Like

As usually wise words I will keep in mind.

Thanks to ClarionHub I certainly did last few weeks. I am making steps in optimising this big source procedure. :slightly_smiling_face:

1 Like