Hi everyone,
i have a simple application to process two local TPS files (~5 million records each, 5 keys per file). I need to delete records based on a condition.
The application contains only this single procedure.
Here’s a simplified version of the loop:
Set(FIL:KeyKey)
Loop
Next(MyFile)
If Errorcode() then break.
If Year(FIL:checkDate) < Year(Today())
Delete(MyFile)
End
End
This takes forever (you know) it’s clearly slow.
Any tips or other best practices to improve performance?
Sometimes it’s easier to append good rows to a new file (then BUILD()) than it is to delete rows from an existing one.
You’d definitely want to utilize batches of logout/commit or stream/flush, say every several thousand rows.
If you don’t have to make RTL calls inside the loop, then don’t. e.g., the value of YEAR(TODAY()) could probably be assumed to be the same throughout the loop, so its value could be stored in a variable before the loop.
In deletion loops, I like going backwards. It definitely helps speedwise with queues, and helps my brain with files
ThisYear = year(today())
Stream(MyFile)
Set(FIL:KEYKEY)
loop
previous(MyFile)
if errorcode() then break.
count +=1
if count % 100 = 0
Stateline = count &' records read'
display
end
if year(FIL:checkDate) < ThisYear
delete(MyFile)
delcount += 1
Statedeleted = delcount &' records deleted'
display
end
end
Flush(MyFile)
i will try the Append+Build way and compare results.
DISPLAY is very very expensive time wise. If you have millions of rows, I’d display every thousand, maximum. It will be like night and day to get rid of most of those DISPLAY.
You are doing a DISPLAY every time you Delete so that will slow things a lot.
I’d change from SET(Key) to SET(File). My reason is the Delete changes the Index so I would hope it would be simpler and faster not to use an Index.
I would suggest code like this:
ThisYear = year(today())
Stream(MyFile)
Set(MyFile) !was Set(FIL:KEYKEY)
loop
NEXT(MyFile) !Odd? previous(MyFile)
if errorcode() then break.
if year(FIL:checkDate) < ThisYear
delete(MyFile)
delcount += 1
!Not here ---> Statedeleted = delcount &' records deleted'
!Not here ---> display
end
count +=1
if count % 1024 = 0 THEN !Divide LONG by power of 2 better than % 1000
Statedeleted = delcount &' records deleted'
Stateline = count &' records read'
display
end
end
Statedeleted = 'FLUSH '& delcount &' Deleted'
DISPLAY !In case Flush takes some time update Display
Flush(MyFile)
Rather than calling the YEAR() function 5 million times consider calculating the Start of Year Date so it is a simple compare IF Long < Long
ThisYearJan1 LONG !or DATE if that matches FIL:checkDate ?
ThisYearJan1 = DATE(1, 1, Year(today()) )
...
if FIL:checkDate < ThisYearJan1 !was year(FIL:checkDate) < ThisYear
delete(MyFile)
...
Its easy to show a Progress control. Its a nice visual indicator of the status to watch grow to 100%.
Just add a Progress to the Window with USE() as your Count. Set the Range High to the Records in the file then you don’t have to calculate a percent.
count LONG
ThisYearJan1 LIKE(FIL:checkDate) !LONG or DATE
Window WINDOW('Purge MyFile'),AT(,,216,59)
PROGRESS,AT(8,15,200,12),USE(count) !<-- Add Progress
END
...
open(Window)
?count{PROP:RangeHigh} = Records(MyFile) !Progress 100% count
ThisYearJan1 = DATE(1, 1, Year(today()) ) !calc 1/1/Year Date
Stream(MyFile)
Set(MyFile) !was Set(FIL:KEYKEY)
loop
NEXT(MyFile)
if errorcode() then break.
if FIL:checkDate < ThisYearJan1 !was if year(FIL:checkDate) < ThisYear
delete(MyFile)
delcount += 1
end
count +=1
if count % 4096 = 0 then !Divide LONG by power of 2 better than % 1000
Statedeleted = delcount &' Records Deleted ' & |
INT(delcount/count*100) &'%' !Show % Deleted
Stateline = count &' Records Read'
display
end
end
Statedeleted = 'FLUSH '& delcount &' Deleted'
DISPLAY !In case Flush takes some time update Display
Flush(MyFile)
I added the % Deleted after the count as interesting.
If the file has any MEMO fields then see if using NOMEMO(MyFile) speeds up the read statements. I assume NOMEMO can be used with DELETE, but I don’t know that I have ever done that.
If you were accessing the .TPS across a network then I might mention using the IP driver, if you have it, with a custom server-side procedure.
As this file has the THREAD parameter ON, I’m tented to get years in a Queue (2024, 2023,2022,…) and then initiate a threaded child process for each year.
is that safe to do?
Is the “FIL:checkDate” field the first component of any of the five indexes? If so, use that to read subsets of the file.
What is the percentage of records to delete? If deleting a high percentage then consider that earlier suggestion of APPENDing records to a new file for the records you want to keep and then issuing a BUILD when done.
If this is going to be done “off line” then I’d open the file in exclusive mode, "OPEN(MyFile, 2h + 10h) and no, I would not use multiple threads.
You would need to use LOGOUT on quite small batches (say 1000) to avoid complete lockout of all threads. STREAM is unable to be used unless the file is open exclusive, which means no threads.
Not sure if this would add anything but complexity.