TryInsert takes longer with more records

JonW · June 8, 2021, 5:17pm

I agree with most of the comments. Personally I would use PGAdmin to import the full plain csv and then write five statements to extract the data from those files.

The one I disagree with is “logout/commit don’t speed up SQL inserts”. Yes they do. The difference between doing 1 million (200K * 5 files) transactions and one (or a thousand even) transaction is enormous. In a SQL database the more you can get away from doing row-by-row the better. SO, bulk-loading the original CSV and doing the population of each of the sub-tables as a single operation will be much faster.

In terms of why things slow down as you get more records: the only real reason for a slowdown is that maintaining the indexes gets slower as you have more index entries. But I wouldn’t expect to see the size of slowdown you are seeing with such a tiny table. As mentioned earlier, if you were loading large data, commonly you disable the indexes while loading, and then reenable them, which causes them to be rebuilt, after the load is done. Same as an append/rebuild on tps files.

totalhip · June 8, 2021, 5:30pm

Jon - If the inserts necessary for a CSV row are included in a single PG function, I know PG will automatically treat them as a single transaction (ie all or none complete). However, if one specifically includes a logout/commit outside of the call to this function (ie from within Clarion), do you know if PG will override the transaction handling of the function and process many CSV lines at a time.

JonW · June 8, 2021, 5:56pm

Hi Douglas,

If you are considering strictly stuff that happens “in” postgres (or any database for that matter), then commits only happen when you explicitly commit or do DDL (like create a table). So that would mean that inside of a procedure or function everything you do, unless you explicitly put a commit in the middle of it, is part of a single transaction.

With Clarion the prime concern is that the default for ODBC is to commit after every statement. Unless you explicitly turn that off temporarily (which is what Clarion will do when you issue a LOGOUT), every statement that ODBC sends will implicitly have a commit after it.

So from clarion if you do:

loop
get_csvrow
x{PROP:SQL} = ‘insert into tablea…’
x{PROP:SQL} = ‘insert into tableb…’
end

then you have two commits per csv row.

if you do
loop
get_csvrow
x{PROP:SQL} = 'noresultcall do_my_insert(bound csv values…)
end

then you get one commit (transaction) per csv row, where do_my_insert does inserts into both tables.

if you do
logout…
loop
get_csvrow
x{PROP:SQL} = 'noresultcall do_my_insert(bound csv values…)
end
commit

you get one commit (transaction) total

totalhip · June 8, 2021, 6:10pm

Your third example was my guess. With the surrounding logout/commit, I assume the PG function can still return an error if one of the inserts within has an exception. As long as this is true, it allows the Clarion code to be quite simple because one only needs to include a single line within the loop to check for errors.