SQLite memory table from multiple threads - Queues too slow

elriba · September 9, 2022, 5:27pm

Hi,
I have been playing/working with Jeff Slarve’s example on how to use a memory based SQLite table.

I was wondering, can I access such a memory based SQLite table from multiple threads. In other words, can I access the records I add in one thread from another thread, like we do in regular disk based tables?

I have a large project that currently uses queues for this work, but I have found that queues become rather slow when there are a very large number of records, and I was thinking that I could use SQLite as a replacement, but I need to access the same data from multiple threads. That is, I need to add/put/get/delete the same information from multiple threads.

jslarve · September 9, 2022, 6:11pm

Howdy Edgard - I have no practical real-world knowledge of how SQLite works within Clarion threads, but I would assume there’s no built-in thread safety for memory tables.

I’d probably create a class to handle the access with an object that has a critical section.

OTOH, It’s possible that your QUEUE manipulation code could be reviewed and optimized.

Stuff I learned by experimentation:

Deleting rows from the top is very slow compared to looping backwards through the queue for the delete.
Adding rows to the top is slower than adding to the bottom.
Maybe there’s other stuff that could be looked at too, but #1 and #2 are night and day changes that might surprise you.

Rick_UpperPark · September 9, 2022, 6:54pm

Also, have you looked at the in-memory driver?

elriba · September 9, 2022, 11:27pm

Hi Rick,
Yes, that could also be an option. I’m going to see what works best.
Thanks

elriba · September 9, 2022, 11:33pm

Thanks for the tips. I need to add to the queue by a multi-column key. And retrieve from it by using that key as well.

I was surprised to see the impact on the code that removing the GET(queue,key) did. At least 3-4 times faster. Obviously when I change that to a memory table I will introduce a delay too, but from what I saw in your example with sqlite memory tables, these things are very fast.

Right now the clarion6 code is taking about 20-30 minutes per run, and the process restarts after it is finished. So I can get it to run up to 3 times per hour on the best scenario.
If I can get it to run in 5-10 minutes that would be very good. I have had to dissect the code, and that has been ugly, but it seems that I can get a large improvement in speed.

Thanks

anon23294430 · September 10, 2022, 12:01am

I dont know about Jeff’s SQL in memory option, and I dont know what driver switches you are using but the driver string MARS will probably need to be switched on, otherwise your multi threaded app still connects via ODBC using a single connection.

In this link, option 2 would probably work for you provided MARS is enabled.
Using SQLite In Multi-Threaded Applications

You dont say how many records in the queue, but it might be that the queues are being stored in the swap file on disk if you are reaching physical ram limits.

Take a look at the Quantum time slicing I mention in the other thread, give more to the foreground and less to the background and you might be able to get it down to that without any code changes!
The values listed here
Master Your Quantum | Microsoft Docs

HKLM\SYSTEM \CurrentControlSet\ Control\ PriorityControl\ Win32PrioritySeparation

Reg setting                 Foreground    Background
0x28, 0x29, 0x2A            18            18
0x18, 0x19, 0x1A            36            36
0x24                        6             6
0x25, 0x14                  12            6
0x26                        18            6
0x15   <--                  24            6
0x16   <--                  36            6

Your reg setting will probably be (if a typical desktop) on 0x26, so if you changed it to something like 0x15 or even 0x16 where the foreground app gets more cpu time, you could see your code completing in the 5-10min window without any extra effort! Just remember after changing the registry, to reboot the machine for the settings to take effect. And if you go into Control Panel, System Icon, Advanced Tab, Performance Settings button, Performance Options Window, Advanced tab, Adjust for best performance of Programs or Background Services, and then click OK to exit the window, this will reset your registry setting changes back to windows defaults, so you might want to read you reg setting before running the timer event code.

FWIW.

CarlBarnes · September 10, 2022, 12:44am

We might have some ideas if you post your Queue design and some of the code for how you build it (Add) and process it (Get).

Like put the largest string on the end as IIRC it gets clipped to reduce size. Sometimes it’s faster to simply ADD(Q) with no order, then SORT(Q,order). Obviously that won’t work (to improve speed) if during build you need to GET(q,order)

vitesse · September 10, 2022, 4:43am

well it should still work. But maybe you mean it would not be any quicker. Clarion keeps track of whether keys are up-to-date or “dirty” and re-sorts if and when necessary. So there is no real need to do a sort before using a queue key.

I would also like to see Edgard’s code to see how it could be improved.

vitesse · September 10, 2022, 4:48am

another option is to use a local “private” tps file which is logged out at open and rolled back before close. So the records are never actually written to your local hard drive.

I did this last century on one project and found it faster than queues. However, like queues, tps transactions can slow down when the number of records get enormous.

elriba · September 10, 2022, 3:41pm

Just a bit of follow-up of what this project does. This project scans the cash register sales as they occur during the day. When it started there were 4 stores I think, and now there are 14 stores.
The code needs to read the transaction tables, which are 3, the header table, a “payments” table, and the rows table. In addition, for each row it needs to load general information such as the item’s category, the item’s tax rules, etc…, and it needs to load the item’s price on the day of the sale (because I could be processing a previous day).

My original code (blushing now ) was to read the header table, and process its payments and rows. For each row I would load all the other information, AGAIN and AGAIN for each row. The reason is that I had a class that did this, so instead of cleaning the code to do it only once per item, I re-used the old code… So, I thought this is the first thing I need to fix. I need to load the general information and prices only once. And then follow up from there.

This is what I have done, and my results.

(1) I started by loading the item’s general information and the item price on the given day into queues, so that I wouldn’t have to re-read this information every time. This was actually a fairly large undertake because the code was a bit complex (and ugly too). I thought this would have a HUGE impact, but was surprised that it did improve, but not as much as I thought it would be. I figured out I had at least two problems:
(1.1.) For reading records that are not changing as you need them, you NEED to use BUFFER. That makes a huge difference. Using a VIEW and a proper index is NOT enough. I thought to myself, “you should have already known this…”. So I fixed that, and that improved the read speed quite a bit. BTW, I didn’t try LOGOUT/COMMIT, don’t know what effect that would have had on the read speed.

(1.2.) Even with the improvement using BUFFER, I then realized that the process of reading a header record, and then its payments and rows was not the most efficient. It would be much faster if I could read each table’s records in sequence: All of the header records, then all of the payments, then all of the rows.

(2) Ok, I will read all the records of each table in sequence. That is faster, but I need to tally information on the header table’s row. How do I do that? “I will use a queue” to hold the header table’s row and process it from the queue.
So, I built a queue which contains the record of the header table, plus some temporary fields to store totals. I thought this should work ok, after all, “queue access will be nearly intantaneous”.
I stripped my code of all disk updates/inserts/deletes, and put timers to try to understand what is taking time in the processing of records. After reading the header records, I wanted to see how much time it would take to read the complete payments rows.

Header   =  900 (this was earyl in the day, and thus there weren't that many header records).
Payments = 4900 (this includes GETting the correspoding header record from the queue, and PUTing afterwards)
Payments = 4554 (this just GETs the corresponding header record from the queue, no PUT)
Payments =  383 (this doesn't GET or PUT the queue).

WHAT!!! GETting the queue record adds a huge overhead. “I wonder if there is a faster way to hold the header records in memory?”. I remembered Jeff’s example, so I went to look at it, and it loads over 150,000 records almost instantaneously. “hmmmm… I need to try this…”

This is the header’s queue.

TransQ                    QUEUE
        Fecha                       DATE
        CodigoTiendaPos             CSTRING(11)    !Store number ID
        CodigoCajaPos               CSTRING(11)    !Cash register ID
        IdTransaccion               LONG           !Cash register's transaction number 
        CostoTotal                  DECIMAL(9,2)   !Total cost from ERP
        Cliente                     LONG           !Client number
        IdSeqEmpleado               LONG           !Employee number
        fueProcesado                BYTE           !Processed?
        NumInconsistenciasGraves    SHORT          !Number of grave inconsistencies...
        fuePosteado                 BYTE           !Posted to the ERP?
        !
        TMP::VentaTotalInventario   DECIMAL(9,2)   !On these variables we tally totals.
        TMP::TotalGrabado1          DECIMAL(9,2)
        TMP::TotalGrabado2          DECIMAL(9,2)
        TMP::TotalGrabado3          DECIMAL(9,2)
        TMP::TotalGrabado4          DECIMAL(9,2)
        TMP::TotalGrabado5          DECIMAL(9,2)
        TMP::TotalPagosPos          DECIMAL(9,2)        !Total de pagos (monto) en la transacción.
        TMP::NumLineas              LONG                !Número de Líneas leídas en la transacción.
        TMP::NumLineasProcesadas    LONG                !Número de Líneas leídas y PROCESADAS en la transacción.
        TMP::NumPagos               LONG                !Número de pagos leídos en la transacción.
        TMP::NumDescuentos          LONG                !Descuentos leídos en la transacción.
        TMP::Graves                 LONG                !Inconsistencias GRAVES en la transacción.
        TMP::NoGraves               LONG                !Inconsistencias NO-GRAVES en la transacción.
        TMP::LineasNoCerradas       LONG                !Líneas que no fueron "CERRADAS" en la transacción.
        TMP::PagosNoCerrados        LONG                !Pagos que no tienen su información "COMPLETA" en la transacción.
                                END

Add code:
ADD(SELF.TransQ,SELF.TransQ.Fecha,SELF.TransQ.CodigoTiendaPOS,SELF.TransQ.CodigoCajaPos,SELF.TransQ.IdTransaccion)

GET.  The same key fields....

I thought that maybe I could speed things up by creating an artificial queue field to use as key, rather than the 5 field key, but thought that this opens the door for a whole range of problems. If one of the memory table techniques work faster, that would be safer.

So - sorry for the long diatribe - that’s when I decided to ask Jeff Slarve here on the forum.

anon23294430 · September 11, 2022, 3:13am

Thats quite surprising, is anything being written to disc albeit just temporary or just held in memory by the driver?

The TPS transaction file never contained much data at all, like a few bytes iirc, which makes me wonder if the transaction logout is using the workstation memory as a temporary buffer.

vitesse · September 11, 2022, 5:36am

yes my understanding is that is exactly what is happening with tps transactions - changes are all done locally in memory and only written to disk on COMMIT, or discarded with ROLLBACK. This is why the tps file is locked for changes by others while the transaction takes place. So in a multi-user scenario keep transactions as brief as possible.

anon23294430 · September 11, 2022, 10:12am

I think with hindsight the Logout timeout for 2 seconds and the retry Attempt number (3x) for tps files in ABC was way too slow, considering the speed of ethernet networks back then. It was the one of the factors that forced me to start using MS SQL as a backend datastore.

I note the comment in the ABC where it says in the abfile.clw,

!In a ‘heavy’ usage network a number > 3 might be useful
!although the machine will go ‘dead’ for longer.

makes me think a bit of feedback here in the class would have been useful instead of the machine going “dead” as they put it.

I always assumed the data was written to the tps file as pseudo temp records and then record shuffling in the tps took place before the keys were rebuilt, never knew it was doing an IMDD type of job.

CarlBarnes · September 11, 2022, 8:42pm

elriba:

TransQ                    QUEUE
        Fecha                       DATE
        CodigoTiendaPos             CSTRING(11)    !Store number ID
        CodigoCajaPos               CSTRING(11)    !Cash register ID
        IdTransaccion               LONG           !Cash register's transaction number 
... 20 more fields ...
                                END

ADD(SELF.TransQ,SELF.TransQ.Fecha,SELF.TransQ.CodigoTiendaPOS,SELF.TransQ.CodigoCajaPos,SELF.TransQ.IdTransaccion)

GET.  The same key fields....

I would change the CSTRING(11) to a STRING(10) so there is no need for the RTL to mess with the trailing garbage after the <0> and 2 bytes smaller.

You have a Queue with 24 fields that I total takes 114 bytes, that does not seem big.

I’d try making an Index Queue used for grouping the data by the 4 fields so its 32 bytes of data to keep in order. You would simply ADD(TranQ) so the new record would be at the end.

IndexQ          QUEUE  
LookupKey         GROUP            !Note is NOT in Order for IdTransaccion
Fecha                DATE
CodigoTiendaPos      STRING(10)    !Store number ID  - note Not "C" String
CodigoCajaPos        STRING(10)    !Cash register ID - note Not "C" String
IdTransaccion        LONG          !Cash register's transaction number 
                  END 
TransQ_Pointer    LONG             !So can GET(TransQ, IndexQ.TransQ_Pointer)
                END

I included an IndexQ LookUpKey field that groups the 4 fields. I’d test it to see if faster. The last LONG will not sort in correct order (due to Little Endian), but will be fine for Grouping. At the end you would Sort the TransQ by the 4 fields to be in correct order.

Since the order does not matter I would put the most unique fields First in IndexQ, i.e. if RegisterID is unique across all stores then put it before Store ID.

IndexQ. = 4 Lookup Fields {Fecha, CodigoTiendaPOS,CodigoCajaPos, IdTransaccion}
GET(IndexQ, IndexQ.LookupKey)
IF ERRORCODE() THEN    !Did Not find in Index
    ! ? CLEAR(TransQ)      Not needed if All fields are assigned,
    TransQ. = 4 Lookup FIelds {Fecha, CodigoTiendaPOS,CodigoCajaPos, IdTransaccion}
    TransQ. = Assign other 21 fields
    ADD(TransQ)  !Note No Fields for order, just added to end
    IndexQ.TransQ_Pointer = POINTER(TransQ)   !Save for GET(TransQ, Pointer)
    ADD(IndexQ, IndexQ.LookupKey)   
ELSE  !Did find in Index so must have TransQ
    GET(TransQ, IndexQ.TransQ_Pointer)
    TransQ. += Accum the 21 fields     !Edit moved into IF
    PUT(TransQ)                        !Edit moved into IF
END
!Edit moved into IF --> Update TransQ Totals
!Edit moved into IF --> PUT(TransQ)

...

At end SORT(TransQ, the 4 fields)

Edit:
Change the IF block to move += Accum into ELSE and so before ADD add do = Assign for all new records. So more code but faster.

Question: Do Store and Register have a LONG unique ID you could use in the IndexQ, the TransQ would use the Strings that sort for humans.

vitesse · September 12, 2022, 12:55am

that seems a pretty good scheme to me Carl, using a separate index queue. It will be interesting to see any speed difference. And I agree regarding avoiding the CSTRINGs.

I have sometimes used a CRC on a group and used that for a key, but it adds some complexity due to needing to allow for clashes.

if there is a lot of tallying it might be worth using LONGs instead of DECIMAL(9,2) and counting by cents instead of dollars (or whatever currency this is) but perhaps the overhead of multiplying by 100 would offset any gains by using LONGs.

but to be honest I am wondering if this queue handling is really the bottleneck at all? Could there be some other disk reads going on that are the real time culprit?

Another (later) thought: You could just use Carl’s index queue and replace the TransQ with a dynamic array. Where Carl stores the pointer, you just increment that by one each time and use that as a record offset. You just use a dynamic string and increase its size each time you add a record.

I would use StringTheory for that as it would handle all the low-level memory allocation for you and basically simplifies everything as you just append the group as a record. (Note at a pinch you could similarly use the built-in systemStringClass but I am less familiar with that.)

You know where any “record” is in the string by simply calculating:
((recNumber - 1) * recSize) + 1

an array will be much more efficient than a queue.

lots of options!

CarlBarnes · September 12, 2022, 2:12pm

Good Point!

Often the Window updating slows down processing. Edgard should look at all code updating the Window or doing a DISPLAY to minimize updates. Remove AUTO from the Window.

I got asked to speed up a 7 hour process moving 100+ files from TPS to SQL. I found that for EVERY Record it was updating the Window with Record Count, Elapsed Time for current File, Elapsed Time for Job. It was getting the DATE and CLOCK twice for every record, then calculating.

My way of doing progress.

Only Calculate the New Progress % at the end of the Timer Event. So after 20 to 50 records where processed
Compare IF New Progress > Current THEN update on window so only every 1% change.

Rather than Calculate with a divide every time:

   Progress = RecordCount / TotalRecords * 100

You can save the divide each record by calculating it once at the start:

    ProgressFactor  REAL
    ProgressFactor = 100 / TotalRecords   !At the top once
    ...
    Progress = RecordCount * ProgressFactor   !Each record

That makes it easy to have a Progress that goes 0 to 25 which is plenty of segments to visualize progress:

   ProgressFactor = ?Progress{PROP:RangeHigh} / TotalRecords   !At the top once

elriba · September 12, 2022, 3:02pm

Hi Carl,
Thanks for these points. I’ll keep these in mind if the memory tables don’t yield an advantage.
Edgard