Just a bit of follow-up of what this project does. This project scans the cash register sales as they occur during the day. When it started there were 4 stores I think, and now there are 14 stores.
The code needs to read the transaction tables, which are 3, the header table, a “payments” table, and the rows table. In addition, for each row it needs to load general information such as the item’s category, the item’s tax rules, etc…, and it needs to load the item’s price on the day of the sale (because I could be processing a previous day).
My original code (blushing now ) was to read the header table, and process its payments and rows. For each row I would load all the other information, AGAIN and AGAIN for each row. The reason is that I had a class that did this, so instead of cleaning the code to do it only once per item, I re-used the old code… So, I thought this is the first thing I need to fix. I need to load the general information and prices only once. And then follow up from there.
This is what I have done, and my results.
(1) I started by loading the item’s general information and the item price on the given day into queues, so that I wouldn’t have to re-read this information every time. This was actually a fairly large undertake because the code was a bit complex (and ugly too). I thought this would have a HUGE impact, but was surprised that it did improve, but not as much as I thought it would be. I figured out I had at least two problems:
(1.1.) For reading records that are not changing as you need them, you NEED to use BUFFER. That makes a huge difference. Using a VIEW and a proper index is NOT enough. I thought to myself, “you should have already known this…”. So I fixed that, and that improved the read speed quite a bit. BTW, I didn’t try LOGOUT/COMMIT, don’t know what effect that would have had on the read speed.
(1.2.) Even with the improvement using BUFFER, I then realized that the process of reading a header record, and then its payments and rows was not the most efficient. It would be much faster if I could read each table’s records in sequence: All of the header records, then all of the payments, then all of the rows.
(2) Ok, I will read all the records of each table in sequence. That is faster, but I need to tally information on the header table’s row. How do I do that? “I will use a queue” to hold the header table’s row and process it from the queue.
So, I built a queue which contains the record of the header table, plus some temporary fields to store totals. I thought this should work ok, after all, “queue access will be nearly intantaneous”.
I stripped my code of all disk updates/inserts/deletes, and put timers to try to understand what is taking time in the processing of records. After reading the header records, I wanted to see how much time it would take to read the complete payments rows.
Header = 900 (this was earyl in the day, and thus there weren't that many header records).
Payments = 4900 (this includes GETting the correspoding header record from the queue, and PUTing afterwards)
Payments = 4554 (this just GETs the corresponding header record from the queue, no PUT)
Payments = 383 (this doesn't GET or PUT the queue).
WHAT!!! GETting the queue record adds a huge overhead. “I wonder if there is a faster way to hold the header records in memory?”. I remembered Jeff’s example, so I went to look at it, and it loads over 150,000 records almost instantaneously. “hmmmm… I need to try this…”
This is the header’s queue.
TransQ QUEUE
Fecha DATE
CodigoTiendaPos CSTRING(11) !Store number ID
CodigoCajaPos CSTRING(11) !Cash register ID
IdTransaccion LONG !Cash register's transaction number
CostoTotal DECIMAL(9,2) !Total cost from ERP
Cliente LONG !Client number
IdSeqEmpleado LONG !Employee number
fueProcesado BYTE !Processed?
NumInconsistenciasGraves SHORT !Number of grave inconsistencies...
fuePosteado BYTE !Posted to the ERP?
!
TMP::VentaTotalInventario DECIMAL(9,2) !On these variables we tally totals.
TMP::TotalGrabado1 DECIMAL(9,2)
TMP::TotalGrabado2 DECIMAL(9,2)
TMP::TotalGrabado3 DECIMAL(9,2)
TMP::TotalGrabado4 DECIMAL(9,2)
TMP::TotalGrabado5 DECIMAL(9,2)
TMP::TotalPagosPos DECIMAL(9,2) !Total de pagos (monto) en la transacción.
TMP::NumLineas LONG !Número de Líneas leídas en la transacción.
TMP::NumLineasProcesadas LONG !Número de Líneas leídas y PROCESADAS en la transacción.
TMP::NumPagos LONG !Número de pagos leídos en la transacción.
TMP::NumDescuentos LONG !Descuentos leídos en la transacción.
TMP::Graves LONG !Inconsistencias GRAVES en la transacción.
TMP::NoGraves LONG !Inconsistencias NO-GRAVES en la transacción.
TMP::LineasNoCerradas LONG !Líneas que no fueron "CERRADAS" en la transacción.
TMP::PagosNoCerrados LONG !Pagos que no tienen su información "COMPLETA" en la transacción.
END
Add code:
ADD(SELF.TransQ,SELF.TransQ.Fecha,SELF.TransQ.CodigoTiendaPOS,SELF.TransQ.CodigoCajaPos,SELF.TransQ.IdTransaccion)
GET. The same key fields....
I thought that maybe I could speed things up by creating an artificial queue field to use as key, rather than the 5 field key, but thought that this opens the door for a whole range of problems. If one of the memory table techniques work faster, that would be safer.
So - sorry for the long diatribe - that’s when I decided to ask Jeff Slarve here on the forum.