How to import CSV into Clarion 11 TopSpeed file without writing code

In C6, not tried since, I used to be able to create a clarion dct file/table where the data source was ASCII, set the field delimiters and set the filename so it was pointing to the CSV filename and path in order to use the CSV file as a data source in a clarion app.

Then making a simple clarion app that opened the csv file before copying the records into a TPS or SQL database is optional, the app wizards would make it a 5 or 10 min job, depending on if I had to process the data in some way, like removing duplicates or concatenating fields or splitting fields eg fullname field into firstname, middlename & surname fields or vice versa.

If you are importing a large CSV file, like million+ or billion+ records, and its going into a sql db, where MS SQL server was the ultimate data repository, the MS SQL server has an import option which is the fastest to use. It bypasses the db record and table locking which could be in action.
Importing and Working with CSV Files in SQL Server (sqlshack.com)

By comparison, when importing v.large csv files, into a sql server, using a clarion app (or any other app), would take hours instead of minutes. Sure the length of time to import could be reduced if using LogOut & Commit importing records in batches, but everything was slower in my tests unless I used the built in MS SQL server flatfile import method as detailed above. I’m sure this would apply to other SQL servers, not just MS SQL server, but some of the csv files I had were massive, literally hundreds of millions of records in a csv file.

Hi Waldo,

To best answer this question, it’s worth taking a simple step backwards.
The dictionary is no more than a repository where you declare file structures.
the templates use the dictionary as input in a variety of places.

But that does not mean you are limited by the dictionary, or the templates. You are very able to declare a table structure (FILE) as a local data structure in any procedure, by hand. So it’s completely possible to declare a local structure, based on say the ASCII, BASIC or DOS drivers, and then code a CSV import from there. No dictionary, or template, required.

If you do want to use a template, then you get to declare the structure in the dictionary, simply because that is where the templates get their structures from. The one is input to the other.

Of course declaring specific FILE structures for each CSV you might want to import seems like a lot of work. In that case you can treat the FILE as a “block of text” and then simply parse that block using the Clarion String Handling functions (Instring, sub etc.) Using this approach you can re-use a single structure all over the place with minimal effort.

Of course the DOS driver requires the file to be loaded in blocks, and then stitch the blocks together before (or during) parsing. By contrast the Windows API allows you to read the whole file into a single string with minimal code. This is a bit faster, and removes the need to parse rows that span across blocks. BUT limits the size of CSV you can import (probably something around a gig in size becomes the practical maximum.)

If you do this approach a couple times you will eventually make your own higher=level string functions to make things easier. You can build these directly on the Clarion string commands, or the SystemString class.

Because I do this sort of work a lot, and because we deal with even more complicated CSV formats (EDI, HL7 etc), I maintain a string library to do all this work for me. So doing these imports typically takes no more than a handful of lines, plus one line per field. The string library I maintain is called StringTheory (the one Jeff linked to above). It’s the premier string handling library used by most Clarion programmers to do, well, pretty much everything string related.

So the good news is, you have lots of choices and approaches you can take. Which one you prefer will depend on your requirements and also your willingness to write code, or use code written by others.

Cheers
Bruce

1 Like

Definitely. If you are importing directly into a table, and the backend has the functionality to do this, then it makes sense to do it there. This will always be the fastest method.

1 Like

So then, importing the sql table which now holds the CSV data, into the clarion dct, is the other route which could be taken. I think we need some sort of Sat-Nav for navigating routes in IT. :grinning:

1 Like

TopScan has an Export to CSV that I often use, I agree it would be very useful if there was also an Import.

My method is not “code free” but it is mostly generated. Define the CSV File in the DCT (see import below). The DCT lets you browse file to see its right. For a 1 time convert use the DCT conversion, or if you’ll be doing this often…

In an APP generate a Process to Read the CSV and Add to the TopSpeed. You’ll have to code those assignments and convert types. To help you can wizard a Browse / Form of the CSV file to view it before import. I made an example

Use the Import Tables in DCT to aid in defining the CSV file.

The import does a good job of picking types but you’ll need to review them. If you don’t check “First Header” the field names will be LABEL1,2,3… All the data types will all be STRING and the size will be increased to fit the Header label. E.g. “Benefits Required” is Yes/No so is sized as STRING(3) with Header checked but without STRING(18).

You will want to change the File’s FileName property because it will have the path picked in theimport wizard C:\xxxx\xxx\file.csv.

Now you have the CSV File and in an APP you can generate a Process to read it and add code to write your TPS.

If this is just a 1 time conversion you can use the DCT Conversion. Right-click on the TPS file and choose “Generate Conversion Program”, then pick From Table, then Pick the CSV file. It can do a “Code Free” Auto Convert if your column names match.

2 Likes

SQLite offers an easy way to create a DB Table from CSV with its SQLite3.Exe tool. You get from SQLite.org. Search the web for “SQLite Import CSV” for many tutorials

You run SQLite3.exe and issue these Dot commands

The fields all end up as TEXT which Clarion DCT Import defines as STRING(20). That tool has a lot of options so I may have missed something. They are ways to configure delimiters, quotes and EOL.

Now you have an SQLite file you can import into TPS.

1 Like

Thanks for the advice.

As a project and learning tool for Clarion 11, I figured I would try to make my life easier at work by creating an App/Dashboard that will accept a queried database which exports it into a csv or excel format and then import it into a Clarion app. There will probably be no more than 50 records a day imported, I would also like to append the records and not overwrite so I can have records of events to go back and look at if I need to.

I am also looking for a way to run the query for the info I need from the App I create in Clarion 11, I can call the query up from google sheets already but I am not a big fan of the product even though I can manipulate it the way I like my results to look.

Anyway, that is what I am trying to do for this project, I figured it would be a good learning tool for myself.

Also did I forget to mention you guys are Awesome!

I will experiment with the recommendations that you guys have provided, thanks for the quick response’s.

Regards,

99% of the data I will pull are strings, thanks for the advice.

Then you’ll want the CSV file defined in your DCT. You can wizard a Browse/Form to preview the CSV before import. You’ll want to use a File Loaded browse to workaround the bug I show in my example with Header row and previous.

And if you want some privacy of sorts, redirecting command line (console) output to txt files that you import into your clarion app, can be another way to keep checks/logs on layouts and stuff before you even get to the point of importing a csv file.

Edit: I should make it clearer 1 = STDOUT, 2 = STDERR

STDERR - Error msgs from commands go to this pipe before being displayed. clarion Halt.

STDOUT - Everything else goes to this pipe before being displayed.

Redirect output to file - Overwrite
[command] > [path\filename]
1 = STDOUT

Redirect output to file - Append
[command] >> [path\filename]

Redirect output to file - overwrite, redirect stderr to file - overwrite
[command] > [path\filename] 2> [path2\filename2]
1 = STDOUT
2 = STDERR

Redirect output to file - append, redirect stderr to file - append
[command] >> [path\filename] 2>> [path2\filename2]
1 = STDOUT
2 = STDERR

Redirect output to file - overwrite, redirect stderr to file - append
[command] > [path\filename] 2>> [path2\filename2]

Redirect output to file - append, redirect stderr to file - overwrite
[command] >> [path\filename] 2> [path2\filename2]

Redirect output to file - overwrite, redirect stderr to stdout (like a clarion Reference variable ie target)
[command] > [path\filename] 2>&1

Redirect output to file - append, redirect stderr to stdout (like a clarion Reference variable ie target)
[command] >> [path\filename] 2>&1

Havent tried this, dont know how this would be handled.
Redirect output to file - overwrite, redirect stderr to same file - append
[command] > [path\filename] 2>> [path\filename]

or this one.
Redirect output to file - append, redirect stderr to same file - append
[command] >> [path\filename] 2>> [path\filename]

I am able to import the text in the Library but the first line as Headers check box only imports the first line as headers and no data, without checking that I can import the data when building the library. But that is okay, I am getting the skeleton built and have a good starting point. Clarion actually crashes when I check the box on some files, tested several csv’s and they only show data when the box is unchecked. Building the app is the same result.

The CSV data must conform to the spec i.e. have a single heading line with all the columns. Then the next line is data. You may have to edit the file to allow the DCT to read the spec.

Post some sample data of the first few 5 of the file. Either post a screen cap of the data in notepad, or paste it, select it and click the Code </> button e.g.

Code,Value,Data Type
000,English,char(3)
001,Spanish,char(3)
002,Greek,char(3)
003,Italian,char(3)

Try to load the csv in something like Notepad++ to see what the line terminators actually are. You cant always rely on the same line terminator on the 1st row.

One of my early jobs was importing reports from mainframes around the country into Lotus123, so we had all sorts of line terminating differences to cater for depending on which mainframe the report came from and depending on whether it was a report sent to file or was an export file of sorts, or the terminal program capturing the screen info and the mainframe we were remotely connected to.

It might pay to load the file using the DOS file driver first, maybe scan for any < CR > and replace with <CR,LF> aka <13> to <13,10>. The basic driver uses CTRL+Z as a line terminator, maybe scan for any omissions before importing. You could also scan for any bytes which are not going to work with the Basic driver.
Looking at the class may also give you an idea of what its looking for because its not Sterlingdata’s Impex (Import Export) addon. IMPEX Clarion Template (sterlingdata.com) This addon is more robust and flexible when it comes to importing and exporting.

I thought that might be non 13,10 line endings so I tried a Unix (10) and MAC (13) file. The Wizard shows warnings below. The Mac file does not work and make a DCT file. The Unix file works, but did not set the Driver options to /ENDOFRECORD=10 /FIRSTROWHEADER=ON .

image

image

Even though I config the Driver for 10 the Data Browser only shows the Header:
(Edit: this was wrong, should have been /EndOfRecord=1,10 then everything worked.)

image

The 13,10 file works

I am able to create a Dictionary and import the headers and data only after I do the following:

  1. Download the raw query of information to a csv format from the database that I use everyday
  2. import the csv to OpenOffice then export to a .CSV (would like to skip this step) The Raw .csv does not build the data when creating the dictionary.
  3. open Clarion 11 and import the table into a dictionary, I have to redirect some of the selections to point to the file
  4. I do a New clarion project and build the app that only has the 1 Browse
  5. I run it and the header is there plus the data, I do test that the data is there during the Library Build.
  6. I test stuff like fuzzy search to test the file, it works as well on the csv file format.
  7. Next step is to find out what the CSV in step1 is missing/or does not have in order to skip the import/export step that I have to do with the spreadsheet.

I can also START working with the data to push the successful csv clarion 11 browse to a tps format so I can manipulate the data the way I need to see it and give it functionality.

In testing, once I had the data/headers/Browse built, I can change the data in the Open Office created csv file with notepad by adding, deleting, pasting and creating new lines of data and it shows up in the Browse Created by the App Clarion created, I do not have to run it through Open Office again, the csv can be modified and it reflects the changes in the app the wizard created.

So this is where I am at for the moment, 2 challenges ahead are to make the exported file from the data I need compatible with the Clarion Dictionary without having to push it through a spreadsheet and
then save to csv

The second objective is to create a TPS that I can move the imported CSV data to so I can manipulate the Data or create the progam. I know this may need some hard coding but will experiment and see what I can come up with, I am sure I will have more questions.

regards

Zip the file and attach here or to a message to me. It should be easy to see the problem

I tested CSV import into the DCT more and found the Driver Options must be configured right in the import wizard. Those settings are lost and must be entered again into the DCT File

yes that would make it easier for everyone to help you - and chances are (as others already mentioned) it could be the line endings.

incidently I also think StringTheory is the way to go… and if you used that there is a LineEndings method so it becomes a non-issue:

https://www.capesoft.com/docs/StringTheory3/StringTheory.htm#stLineEndings

If you are writing code in a Class ST works well. This poster wanted to not write code.

You can read any CSV File the way StringTheory can by declaring a generic BASIC file with a 1 field array as STRING(256),DIM(32). If you need more than 256 bytes or 32 columns then simply increase those numbers. It can be configured at runtime for any End of Line, Quote or Delimiter with SEND() Driver strings.

You can have a Browse to view the file. You can generate a Process to Read the CSV and write any file (e.g. TPS) having to just write the Assignments and ADD. It will have a progress to give user feedback on large files.

The BASIC Driver handles CSV Spec RFC 4180 doubled quote characters . E.g. “9 “” Nails” should be ‘9" Nails’ but ST returns '9"" Nails". I guess in ST you need to do a Replace('""','"') on each GetLine()?

Example above attached.

CsvArrayExample.zip (66.9 KB)

I work on code that uses the above Array type file often and it works perfectly to read files. If you are writing files it does write all 32 columns. While ugly in Notepad to see all those empty columns, if you are opening it in Excel you’ll never see it.

Tip:
The way Import works so well in the DCT (shown several posts up) the next time I need to write a CSV from my program I can create a prototype file in Excel. I can set the Row 1 Headings to my desired DCT Field Names and a row of real sample data. Save As CSV and import that in the DCT and I’ll have my BASIC file defined.

I wrote a CSV parser that I keep meaning to clean up and put on github.
One of these days. It’s a very different approach.

You still load the whole CSV into a string, but the only additional memory allocated is for a bunch of 8-byte &STRING references. There are no copies needed of the data itself. The only NEW’d data other than the CSV is a single allocation to house all of the &STRINGs. So for huge CSV, you need a lot less memory to parse it out than you would via Split().

interesting Jeff - I had a conversation with someone about something similar maybe a year or so ago - most probably Bruce in relation to possible future StringTheory optimizatons. Of course doing that with ST would break much of the existing code base as often you want to change the lines queue without it affecting the main string value (until you later do a join to combine the lines into the string value). So it is a case of swings and roundabouts… but would be good for large files where you cannot fit effectively two copies into memory.

do you use an array for the &string references or a queue? An array would be quicker and you could use a dynamic string underneath it. Ha Ha you could even use an ST object for that (grins)
.