A few months ago, I had to develop two features:
- Export and import data to Excel files
- Publish the TSV output of some existing Clarion processes to JSON/XML webservices
For both, I needed a straightforward way to convert between Clarion structures and Comma or Tab Separated Values.
I looked at the excellent CSVParseClass, but it didn’t have many of the features I needed, and for the parsing part I already had code I wrote years ago, so I decided to create a new class with a different approach (although I used and still use the CSVParserClass demo to create queue declarations from sample CSV files).
The class has been in production for months, but I finally had time to clean it, partially document it and share it. It’s available at GitHub.
To serialize a queue to a text file:
myQ QUEUE,PRE(myQ)
id LONG
Name STRING(30)
Date DATE
Time TIME
END
fs FlatSerializer
CODE
...
fs.Init
fs.SerializeQueueToTextFile(myQ,'testqueue.csv')
The resulting testqueue.csv
text file will look like this:
ID,NAME,DATE,TIME
5,"Some Name",2021-07-30,18:45:56
7,"Another Name",2021-12-16,08:12:34
To load the same text file to a queue:
FREE(myQ)
fs.Init
fs.LoadTextFile('testqueue.csv')
fs.DeSerializeToQueue(myQ)
For more details, please view the Readme file. Also, the file Tests.clw includes a few unit tests that can serve as examples of how to use all the methods.
The class is named Flat Serializer because CSV/TSV are a flat file formats, and because it flattens Clarion structures. For example, a group inside a group will be flattened like this:
MyGroup GROUP
SomeString STRING
FullName GROUP
FirstName STRING
LastName STRING
END
END
SomeString,FullName,FirstName,LastName
Abc,Carlos Gutierrez,Carlos,Gutierrez
You can use AddExcludedFieldByName()
or AddExcludedFieldByReference()
the exclude either Fullname
or FirstName
and LastName
from the output.
The class uses the TUFO interface, published by Oleg Rudenko and Mike Duglas.
Feedback is welcome.
Edit Oct. 17, 2021
Following Federico Navarro’s lead I added a test using a sample file with 100k lines and 38 columns, and made some optimizations. These are the results:
Change | Seconds |
---|---|
Base line time (first release) | 43.6 |
String reference and slicing when loading file | -6.3 |
Readonly mode and filebuffers | -0.5 |
Precomputed LENs | -10.2 |
DeformatColumnValue optimization |
-2.6 |
Pre-resolving field aliases | -5.7 |
Final time | 18.3 |
I also added a tiny local class fsDynString
(inspired by the StringClass
coded in SV’s libsrc\win\xmlclass.inc
and TreeViewWrap.clw
) to replace ANY
as the unknown length string storage. It didn’t have any noticeable effect on performance in deserializing, probably it helps when serializing, but I didn’t benchmark that.
Edit Nov. 2, 2021
New methods: GetColumnsCount
and GetColumnName
, to query the structure of the loaded file.
Change: GetValueByName
now automatically converts dates and times (matching SetDatesPicture
or SetTimesPicture
, default yyyy-mm-dd
and hh:mm:ss
) to Clarion standard date and time, and removes commas (thousand separators) from numbers in TSV. Can be disabled passing fs:DeformatNothing
.
Edit Nov. 7, 2021
New option: SetSerializeUsingAlias
(default FALSE
): Use the first alias added with AddFieldAliasByReference
as column name when serializing, overriding the fields label and NAME
attribute.
Available at GitHub.