Create a file layout from a CSV file

Tags: #<Tag:0x00007fc0de881200>

Over on the Clarion 11 newsgroup Steve asked:

Is there a quick way to get the file layout of a .csv file?

I want to do some data processing on the csv and a .tps file.

A couple of people quickly mentioned you could just import the file into the dictionary, although when setting the column widths it only checks the first 100 rows.

Wolfgang mentioned:

if you know the columns and if you know the delimiter, you could write
a tiny tool with StringTheory, which parses the .CSV in advance an then you can
estimate the required column widths from StringTheory.

You could even make that tiny tool to generate the tablestructure for you.

Takes some work to tinker such a tool. But who does not like to do some extra
work to save some extra work?

I responded that:

I had the same thought - but of course the irony is that if you have StringTheory then you don’t really need the table structure!

Of course if you were going to then import the csv into another format like tps or some SQL database then knowing the field lengths would be useful.

so with that in mind I wrote a function CreateLayoutFromCSV that does exactly that…

as usual I used my favourite Clarion tool StringTheory (ST) which is ideal for any kind of slicing and dicing involving strings.

the split method of ST (StringTheory) makes parsing CSV very simple as documented at:

https://www.capesoft.com/docs/StringTheory3/StringTheory.htm#ParsingCSVFile

there was recently some talk on the CW-Talk Skype group about compliance with RFC 4180.

this is a “standard” for CSV’s and Wikipedia mentions for RFC 4180:

  • MS-DOS-style lines that end with (CR/LF) characters (optional for the last line).

  • An optional header record (there is no sure way to detect whether it is present, so care is required when importing).

  • Each record should contain the same number of comma-separated fields.

  • Any field may be quoted (with double quotes).

  • Fields containing a line-break, double-quote or commas should be quoted. (If they are not, the file will likely be impossible to process correctly.)

  • If double-quotes are used to enclose fields, then a double-quote in a field must be represented by two double-quote characters.

anyway the sample code linked above in the ST docs currently (Vers 3.38) has:

str   StringTheory
lne   StringTheory
fld   StringTheory
x     Long
y     Long
   code
   str.LoadFile('Somefile.CSV')
   str.Split('<13,10>','"')
   loop x = 1 to str.Records()
     Lne.SetValue(Str.GetLine(x))
     Lne.Split(',','"','"',true)
     loop y = 1 to lne.records()
       fld.setValue(lne.GetLine(y))
       if fld.replace('""','"')
         lne.SetLine(y, fld.getValue())
       end
     end
     field1 = Lne.GetLine(1)
     field2 = Lne.GetLine(2)
     ! and so on...
   end

this does NOT allow for the doubled up quotes mentioned in RFC 4180

so if you had a description field with [3" nails] or [12" vinyl LP] you are likely to see [3"" nails] and [12"" vinyl LP] instead.

to get around this I suggested adding a little more code to check for “” in fields and replace with "

str   StringTheory
lne   StringTheory
fld   StringTheory
x     Long
y     Long
   code
   str.LoadFile('Somefile.CSV')
   str.Split('<13,10>','"')
   loop x = 1 to str.Records()
     Lne.SetValue(Str.GetLine(x))
     Lne.Split(',','"','"',true)
     loop y = 1 to lne.records()
       fld.setValue(lne.GetLine(y))
       if fld.replace('""','"')
         lne.SetLine(y, fld.getValue())
       end
     end
     field1 = Lne.GetLine(1)
     field2 = Lne.GetLine(2)
     ! and so on...
   end

now we have that sorted out we can base our processing of a CSV file to create a file layout roughly on that code…

I have created a procedure that takes the csv file name as a parameter:

!==============================================

CreateLayoutFromCSV  PROCEDURE (String pFileName)
!  written on 11th June 2021 by Geoff Robinson vitesse AT gmail DOT com 
!  (c) 2021 by Geoffrey C. Robinson - Released under the MIT License
str     StringTheory
lne     StringTheory
fld     StringTheory
errs    StringTheory
out     StringTheory
x       Long,auto
y       Long,auto
lines   Long,auto
cols    Long,auto
lengths Queue
row1      long
max       long
        end

  CODE
  if not str.LoadFile(pFilename)
    fld.setValue('Error: Failed to load ' & clip(pFileName) & ' Error: ' & str.lastError)
    do ReportError
    return
  end

  str.Split('<13,10>','"')
  str.removeLines()             ! get rid of any empty lines
  lines = str.records()
  if not lines
    fld.setValue('Error: No CSV lines loaded for ' & clip(pFileName))
    do ReportError
    return
  end

  Lne.SetValue(Str.GetLine(1))     ! get first line (possibly header)
  Lne.Split(',','"','"',true)
  cols = lne.records()
  clear(lengths)
  loop y = 1 to cols               ! set row1 width of each column
    fld.setValue(lne.GetLine(y))
    fld.replace('""','"')
    lengths.row1 = fld.length()
    add(lengths)
  end

  loop x = 2 to lines              ! check each row to get max width for each column
    Lne.SetValue(Str.GetLine(x))
    Lne.Split(',','"','"',true)
    if cols <> lne.records()
      fld.setValue('Inconsistent number of columns on line ' & x & |
                  ': Expected ' & cols & ' but found ' & lne.records())
      do reportError
    end
    loop y = 1 to lne.records()    ! loop through columns
      fld.setValue(lne.GetLine(y))
      fld.replace('""','"')
      get(lengths,y)
      if errorcode()
        lengths.row1 = 0
        lengths.max = fld.length()
        add(lengths,y)
        fld.setValue('Adding column ' & y & ' to lengths queue on line ' & x)
        do reportError
      elsif fld._DataEnd > lengths.max
        lengths.max = fld._DataEnd ! new maximum width for this column
        put(lengths)
      end 
    end !loop cols
  end !loop lines
    
  ! now output format twice - first assuming line 1 is header and then assuming it is just data

  out.setValue(fld.CleanFileName(fld.FileNameOnly(pFileName,false)))
  out.replace(' ','')  ! remove any spaces: same as out.remove(' ')
  if out.length() < 18 then out.setLength(18).
  out.append(' FILE,DRIVER(''BASIC''),NAME(''' & clip(pFileName) & '''),PRE(' & clip(upper(out.slice(1,3))) & ')<13,10>')
  out.append('record               RECORD<13,10>')

  Lne.SetValue(Str.GetLine(1))  ! get first line (assume it is header)
  Lne.Split(',','"','"',true)
  cols = lne.records()
  loop y = 1 to cols
    fld.setValue(lne.GetLine(y))
    do CapitalizeFieldName
    get(lengths,y)
    do AddFieldLine
  end
  out.append('                     END<13,10>')
  out.append('                   END<13,10>')
  out.append('<13,10>!----------------------------------------------<13,10,13,10>')

  ! now output format assuming no header record (ie. first line is data)

  out.append(out.before('<13,10>') & '<13,10>')  ! copy first line
  out.append('record               RECORD<13,10>')
  loop y = 1 to cols
    get(lengths,y)
    if lengths.row1 > lengths.max then lengths.max = lengths.row1.
    fld.setValue('Label' & y)
    do AddFieldLine
  end
  out.append('                     END<13,10>')
  out.append('                   END<13,10>')
  out.append('<13,10>!----------------------------------------------<13,10>')

  if errs.length()  ! do we have any errors logged?
    out.append('<13,10>Errors:<13,10,13,10>')
    out.append(errs)
    out.append('<13,10>')
  end

  fld.setValue(pFilename)
  fld.setValue(fld.pathOnly() & '\' & fld.fileNameOnly(,false) & |
               ' layout on ' & clip(left(format(today(),@d12)))    & |
               ' at ' & clip(left(format(clock(),@T05))) & format(clock()%100,@N02) & '.txt')

  out.saveFile(fld.getvalue())

! stop('saved layout for ' & clip(pFileName) & ' to ' & fld.getValue())


AddFieldLine ROUTINE
  if fld.length() < 22 then fld.setLength(22).
  if lengths.max
    fld.append(' STRING(' & lengths.max & ')<13,10>')
  else
    fld.append(' STRING(1)    ! note: no data found<13,10>')
  end
  out.append(fld)


ReportError ROUTINE          ! note: error is in fld
  errs.append(fld.getValue() & '<13,10>')
  fld.trace()
  stop(fld.getValue() & '<13,10>(Error creating layout from ' & clip(pFileName) & ')')


CapitalizeFieldName ROUTINE  ! note field name is in fld
 data
containsLower Long,auto
containsUpper Long,auto
i             Long,auto
 code
  fld.KeepChars('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_: ',' ')
  fld.Split(' ') 
  loop i = 1 to fld.records()
    fld.setValue(fld.getLine(i))
    if fld.length() < 1 then cycle.
    containsLower = fld.containsA('abcdefghijklmnopqrstuvwxyz',fld.slice(2)) ! exclude 1st char which will be UPPERed
    containsUpper = fld.containsA('ABCDEFGHIJKLMNOPQRSTUVWXYZ')
    if containsUpper
      if containsLower
        if ~inrange(val(fld.valuePtr[1]),val('a'),val('z')) ! first char not lower?
          cycle ! do NOT change this word in field name label - take it as the mixed case given
        end
      else
        fld.lower()
      end
    end
    fld.valuePtr[1] = upper(fld.valuePtr[1])
    fld.setLine(i,fld.getValue())         ! update value in queue of words
  end
  fld.join('')  ! join the words back together to give the field name label

  if fld.length() < 1
    fld.setValue('_')          ! blank field name
  else
    case val(fld.valuePtr[1])  ! first char should be alpha or underscore
    of   val('A') to val('Z')
    orof val('_')
      ! all ok
    else
      fld.prepend('_')
    end
  end

Running this on a file with a header line:

"ID","Invoice","Inv Date","Amount","G.S.T","INV TOTAL","banked date","Client","transType"

gave the following:

test1              FILE,DRIVER('BASIC'),NAME('.\testdata\test1.csv'),PRE(TES)
record               RECORD
Id                     STRING(3)
Invoice                STRING(8)
InvDate                STRING(10)
Amount                 STRING(7)
GST                    STRING(4)
InvTotal               STRING(7)
BankedDate             STRING(10)
Client                 STRING(26)
TransType              STRING(13)
                     END
                   END

!----------------------------------------------

test1              FILE,DRIVER('BASIC'),NAME('.\testdata\test1.csv'),PRE(TES)
record               RECORD
Label1                 STRING(3)
Label2                 STRING(8)
Label3                 STRING(10)
Label4                 STRING(7)
Label5                 STRING(5)
Label6                 STRING(9)
Label7                 STRING(11)
Label8                 STRING(26)
Label9                 STRING(13)
                     END
                   END

!----------------------------------------------

As you can see, what it does is create TWO definitions, one assuming the first line contains the field names, and the second that assumes there is no header line.

Note that the string lengths could be different between the two layouts where a column name in the header is wider than the largest data field.

There is some very rudimentary error handling in ReportError Routine. You may wish to expand this or get rid of the STOP() etc. Errors are logged and added to the end of the format file.

The format file is the name of your data file with "layout on <date> at <time>" added to the end. So each time you run it you will get a new time-stamped file.

The CapitalizeFieldName Routine has got some smarts in it to try to preserve word boundaries through capitalizing first letters. Look, for example, at the output above for:

"ID"           -> Id
"Inv Date"     -> InvDate
"G.S.T"        -> GST
"INV TOTAL"    -> InvTotal
"banked date"  -> BankedDate
"transType"    -> TransType

you could easily change this if you prefer to use underscores between words and so on.

anyway I hope this proves useful - please have a play and report any errors/bugs to me and I will have a look.

2 Likes

Very useful Geoff! Hopefully SV will increase the 100 row limit to read all the records, that would be a simple change.

The DCT Basic Import does analyze Numeric columns and sizes them well as USHORT ULONG or DECIMAL. If there are negative values it will select SHORT or LONG.

You can see that in my post in the How to import CSV into Clarion 11 TopSpeed file without writing code questions topic. Here’s a DCT Imported CSV and the source showing numbers

I’ll paste some code I thought of for figuring out the column type. Each Data row needs to store the max seen for Digits Left and Right of the Decimal, plus if there’s a Deciaml, plus if there’s a negative.

Lengths Queue,PRE(LQ)  !Your Column Lengths Q plus Number
Row1         long
Max          long      

NotNumeric   BYTE  !1=Column w/o a Number found so String
DigitsLeft   BYTE  !Max Digits Left of Decimal Point
DigitsRight  BYTE  !Max Digits Right of Decimal Point i.e. pennies
IsNegative   BYTE  !Less than Zero seen
IsDecimal    BYTE  !'.' seen, can be Zero Digits Right
        end 
NumData     STRING(40)        
Dot         SHORT        !Decimal Point
NumLeft     SHORT        !Digits Left of Decimal Point
NumRight    SHORT        !Digits Right of Decimal Point

!--- For Each Column -----------------------
NumData = LEFT(ColData)     !Remove leading spaces for numbers?
LenNum = LEN(CLIP(NumData))
IF LenNum=0 THEN                            !Is Column blank
    !Basic driver ignores blanks in numeric 
    !or ??? LQ:NotNumeric = True ??? 
ELSIF ~Numeric(NumData) OR LenNum > 31 THEN !Is Column NOT Numeric or too long to be a number 
    LQ:NotNumeric = True                    !Any Non numeric then must be STRING
ELSE
    IF NumData+0 < 0 THEN 
       LQ:IsNegative=True            !allow negative
       IF NumData[1]='-' THEN     !Can there be a trailing -
          NumData=SUB(NumData,2,99)
          LenNum -=  1
       END 
    END 
    Dot = INSTRING('.',NumData)     !Find '.' in 12345.67
    IF Dot THEN                     !Decimal Yes
       LQ:IsDecimal=True 
       NumLeft  = Dot - 1
       NumRight = LenNum - Dot
    ELSE                            !Integer
       NumLeft = LenNum
       NumRight = 0
    END 

    IF LQ:DigitsLeft < NumLeft THEN
       LQ:DigitsLeft = NumLeft
    END 
    IF LQ:DigitsRight < NumRight THEN
       LQ:DigitsRight = NumRight
    END 
END 

!----- Figure out the type 
IF LQ:NotNumeric THEN 
   Type='STRING(' & LQ:Max &')'
ELSIF LQ:IsDecimal OR LQ:DigitsLeft > 9 THEN    !LONG 2,147,483,647
   ! DECIMAL(Length,places) 
   Type='DECIMAL('& LQ:DigitsLeft + LQ:DigitsRight &','&  LQ:DigitsRight &')'
ELSE
  ! IF LQ:DigitsLeft <= 2 AND ~LQ:IsNegative THEN  ! BASIC import does not do BYTE
  !    Type ='BYTE'  
    IF LQ:DigitsLeft <= 4 THEN      !USHORT 65,535
       Type=CHOOSE(~LQ:IsNegative,'USHORT','SHORT') 
    ELSE
       Type=CHOOSE(~LQ:IsNegative,'ULONG','LONG')  !Carl would always do LONG 
    END 
END
1 Like

thanks Carl!

I wish I had re-read your post that you mentioned at How to import CSV into Clarion 11 TopSpeed file without writing code earlier as I had completely forgotten that the dct import of CSV detected types and could get the names!

I see the dct import uses underscores as word separators rather than the Capitalization approach that I took. That would actually be technically simpler to do as you could just replace embedded spaces and other invalid characters with underscores. Anyway I did mention that “you could easily change this if you prefer to use underscores between words” so perhaps I can make that an option. I can see some might prefer the underscore approach with columns like “Max FTE Salary” which would currently be converted to “MaxFteSalary” as opposed to “Max_FTE_Salary”.

thanks also for your type detection code - I will look at incorporating that (or something similar) and will then post an update…

What does setting the type on a CSV column really gain you? Seems safer to validate in code than having the driver always figure it out. e.g., Can you trust the data source to not throw non-conforming data into it?

But if the data is reliable, I wonder if changing (for example) large-ish integer columns from STRING to LONG would allow the driver to pack more data into its “Maximum Record Size”. In that case, it could be very useful if your record structure reaches the maximum size of ~64K. That would be a PITB to test, but interesting to know. :slight_smile:

I can see your point. Having it as STRING also lets you read the Column Heading row to verify the data is still what you expect. It should be optional. I have some CSV files that are very consistent with data types but Sh*t Happens so the STRING allows my code to spot bad data.

What you have created could also be used by ST users by creating a GROUP instead of a RECORD. When there are 10+ columns I would rather code with Field Name Variables (checked by the compiler) than be working with Column Numbers.

Using the GROUP in a LOOP you could move the ST Columns by Number to WHAT(Group,Number).

Could you post your complete source and project as an attachment?

As I said in the other thread defining the BASIC file in the DCT does have some advantages as you can use it with Templates. You can view the file in the Data Base browser. You can Wizard a Browse / Form to view it, and create Reports. You can Wizard a Process to import or process the file.

I think an independent FILE/GROUP generation utility would be useful, especially where you have a very large file that might be too big to load all into memory.
But there are tradeoffs to be made. You might still get some truncated data if you don’t read all of the rows to get the max size.
The GROUP idea is a good one. Would be simple enough to break the RECORD generation of my GenerateFileDef() method off to another method. Or you could just do that part by hand in the mean-time.
The code I recently added for GenerateFileDef() is pretty simple.

I have finished incorporating the type detection as suggested by Carl and also done a few other enhancements including underscore separated as well as the existing Capitalized field names.

I also noted Jeff’s comments that perhaps STRING() types were better after all, so it gives you the choice as it now outputs six formats rather than the two it did previously:

  1. assumming first row is header with field names capitalized, detect/guess field types
  2. assumming first row is header with field names underscore separated, detect/guess field types
  3. assumming first row is header with field names capitalized, all fields string types
  4. assumming first row is header with field names underscore separated, all fields string types
  5. assumming no header, detect/guess field types
  6. assumming no header, all fields string types

note that if your CSV file does have a header row, then the last two formats will be of little value as the widths may reflect the header row field names rather than the actual data. Also in this case the field type detection is likely to present as all strings as the header names probably contain non-numeric data.

similarly if your data does not have a header row, then the opposite is true and the final two formats will be the ones you will want to use.

for example the earlier example I gave with a header of

“ID”,“Invoice”,“Inv Date”,“Amount”,“G.S.T”,“INV TOTAL”,“banked date”,“Client”,“transType”

now gives the following 6 layouts

!----------------------------------------------
! 1) assumming first row is header with field names capitalized, detect/guess field types

test1              FILE,DRIVER('BASIC'),NAME('.\testdata\test1.csv'),PRE(TES)
record               RECORD
Id                     USHORT
Invoice                STRING(8)
InvDate                STRING(10)
Amount                 DECIMAL(7,2)
GST                    DECIMAL(3,2)
InvTotal               DECIMAL(7,2)
BankedDate             STRING(10)
Client                 STRING(26)
TransType              STRING(13)
                     END
                   END

!----------------------------------------------
! 2) assumming first row is header with field names underscore separated, detect/guess field types

test1              FILE,DRIVER('BASIC'),NAME('.\testdata\test1.csv'),PRE(TES)
record               RECORD
ID                     USHORT
Invoice                STRING(8)
Inv_Date               STRING(10)
Amount                 DECIMAL(7,2)
G_S_T                  DECIMAL(3,2)
INV_TOTAL              DECIMAL(7,2)
banked_date            STRING(10)
Client                 STRING(26)
transType              STRING(13)
                     END
                   END

!----------------------------------------------
! 3) assumming first row is header with field names capitalized, all fields string types

test1              FILE,DRIVER('BASIC'),NAME('.\testdata\test1.csv'),PRE(TES)
record               RECORD
Id                     STRING(3)
Invoice                STRING(8)
InvDate                STRING(10)
Amount                 STRING(7)
GST                    STRING(4)
InvTotal               STRING(7)
BankedDate             STRING(10)
Client                 STRING(26)
TransType              STRING(13)
                     END
                   END

!----------------------------------------------
! 4) assumming first row is header with field names underscore separated, all fields string types

test1              FILE,DRIVER('BASIC'),NAME('.\testdata\test1.csv'),PRE(TES)
record               RECORD
ID                     STRING(3)
Invoice                STRING(8)
Inv_Date               STRING(10)
Amount                 STRING(7)
G_S_T                  STRING(4)
INV_TOTAL              STRING(7)
banked_date            STRING(10)
Client                 STRING(26)
transType              STRING(13)
                     END
                   END

!----------------------------------------------
! 5) assumming no header, detect/guess field types

test1              FILE,DRIVER('BASIC'),NAME('.\testdata\test1.csv'),PRE(TES)
record               RECORD
Label1                 STRING(3)
Label2                 STRING(8)
Label3                 STRING(10)
Label4                 STRING(7)
Label5                 STRING(5)
Label6                 STRING(9)
Label7                 STRING(11)
Label8                 STRING(26)
Label9                 STRING(13)
                     END
                   END

!----------------------------------------------
! 6) assumming no header, all fields string types

test1              FILE,DRIVER('BASIC'),NAME('.\testdata\test1.csv'),PRE(TES)
record               RECORD
Label1                 STRING(3)
Label2                 STRING(8)
Label3                 STRING(10)
Label4                 STRING(7)
Label5                 STRING(5)
Label6                 STRING(9)
Label7                 STRING(11)
Label8                 STRING(26)
Label9                 STRING(13)
                     END
                   END

!----------------------------------------------

CreateLayoutFromCSV 20210614 1102.clw (11.9 KB)

Sorry Carl I don’t currently have a project as this is embedded in a large app but I have done an attachment with the complete source code (immediately above) so hopefully that works for you.

and this is the same source code scrollable/viewable here:

CreateLayoutFromCSV  PROCEDURE  (String pFileName)
!-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
!
!  written on 11th June 2021 by Geoff Robinson vitesse AT gmail DOT com 
!  (c) 2021 by Geoffrey C. Robinson - Released under the MIT License
!
!  see: https://clarionhub.com/t/create-a-file-layout-from-a-csv-file/4232
!
!  14th June 2021 - GCR incorporated type detection for fields as suggested by Carl Barnes
!                 - added underscore separated as well as existing Capitalized field names
!                 - now outputs six different formats
!
!-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
str     StringTheory
lne     StringTheory
fld     StringTheory
errs    StringTheory
out     StringTheory
tmp     StringTheory
header  StringTheory  ! format header
trailer StringTheory  ! format trailer
x       Long,auto
y       Long,auto
lines   Long,auto
cols    Long,auto
decLen  Long,auto

attribs group,type ! column attibutes group
max          long  ! maximum characters
notNumeric   BYTE  ! 1=Column w/o a Number found so String
digitsLeft   BYTE  ! Max Digits Left of Decimal Point
digitsRight  BYTE  ! Max Digits Right of Decimal Point i.e. pennies
isNegative   BYTE  ! Less than Zero seen
        end

Q       queue
Row1      group(attribs).   ! attributes of first row
Rest      group(attribs).   ! attributes of the rest of the rows
        end

attribsPtr &attribs
  
dot          LONG  ! Decimal Point
numLeft      BYTE  ! Digits Left  of Decimal Point
numRight     BYTE  ! Digits Right of Decimal Point
                             
allString    BYTE
capitalize   BYTE

  CODE
  if not str.LoadFile(pFilename)
    fld.setValue('Error: Failed to load ' & clip(pFileName) & ' Error: ' & str.lastError)
    do ReportError
    return
  end

  str.Split('<13,10>','"')
  str.removeLines()                ! get rid of any empty lines
  lines = str.records()
  if not lines
    fld.setValue('Error: No CSV lines loaded for ' & clip(pFileName))
    do ReportError
    return
  end

  Lne.SetValue(Str.GetLine(1))     ! get first line (possibly header)
  Lne.Split(',','"','"',true)
  cols = lne.records()
  attribsPtr &= Q.row1
  loop y = 1 to cols               ! set row1 width of each column
    clear(Q)
    fld.setValue(lne.GetLine(y))
    fld.replace('""','"')
    Q.row1.max = fld.length()
    if ~Q.row1.notNumeric then do CheckNumber.    
    add(Q)
  end

  attribsPtr &= Q.rest             ! attributes for subsequent rows (after the first one)
  loop x = 2 to lines              ! check each row to get max width for each column
    Lne.SetValue(Str.GetLine(x))
    Lne.Split(',','"','"',true)
    if cols <> lne.records()
      fld.setValue('Inconsistent number of columns on line ' & x & |
                  ': Expected ' & cols & ' but found ' & lne.records())
      do reportError
    end
    loop y = 1 to lne.records()    ! loop through columns
      fld.setValue(lne.GetLine(y))
      fld.replace('""','"')
      get(Q,y)
      if errorcode()
        clear(Q)
        Q.rest.max = fld.length()
        add(Q,y)
        fld.setValue('Adding column ' & y & ' to lengths attribute queue on line ' & x)
        do reportError
      elsif fld._DataEnd > Q.rest.max
        Q.rest.max = fld._DataEnd ! new maximum width for this column
      end 
      if ~Q.rest.notNumeric then do CheckNumber.
      put(Q)
    end !loop cols
  end !loop lines
    
  ! set up header                                                 
  header.setValue(fld.CleanFileName(fld.FileNameOnly(pFileName,false)))
  header.replace(' ','')  ! remove any spaces: same as out.remove(' ')
  if header.length() < 18 then header.setLength(18).
  header.append(' FILE,DRIVER(''BASIC''),NAME(''' & clip(pFileName) & '''),PRE(' & clip(upper(header.slice(1,3))) & ')<13,10>')
  header.append('record               RECORD<13,10>')
 
  ! set up trailer
  trailer.setValue('                     END<13,10>'     & |
                   '                   END<13,10,13,10>' & |
                   '!----------------------------------------------<13,10>')

  out.setValue('!----------------------------------------------<13,10>')
 
 ! output format six times....

  out.append('! 1) assumming first row is header with field names capitalized, detect/guess field types<13,10,13,10>')
  capitalize = true
  allString  = false
  do OutputAssumingRow1Header

  out.append('! 2) assumming first row is header with field names underscore separated, detect/guess field types<13,10,13,10>')
  capitalize = false
  allString  = false
  do OutputAssumingRow1Header

  out.append('! 3) assumming first row is header with field names capitalized, all fields string types<13,10,13,10>')
  capitalize = true
  allString  = true
  do OutputAssumingRow1Header
                                         
  out.append('! 4) assumming first row is header with field names underscore separated, all fields string types<13,10,13,10>')
  capitalize = false
  allString  = true
  do OutputAssumingRow1Header

  out.append('! 5) assumming no header, detect/guess field types<13,10,13,10>')
  allString  = false
  do OutputAssumingNoHeaderRow

  out.append('! 6) assumming no header, all fields string types<13,10,13,10>')
  allString  = true
  do OutputAssumingNoHeaderRow

  if errs.length()  ! do we have any errors logged?
    out.append('<13,10>Errors:<13,10,13,10>')
    out.append(errs)
    out.append('<13,10>')
  end

  fld.setValue(pFilename)
  fld.setValue(fld.pathOnly() & '\' & fld.fileNameOnly(,false) & |
               ' layout on ' & clip(left(format(today(),@d12)))    & |
               ' at ' & clip(left(format(clock(),@T05))) & format(clock()%100,@N02) & '.txt')
  out.saveFile(fld.getvalue())

 !stop('saved layout for ' & clip(pFileName) & ' to ' & fld.getValue())

!-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

OutputAssumingRow1Header ROUTINE
  out.append(header)                   
  attribsPtr &= Q.rest          ! ignore size settings of first line and just use the rest
  Lne.SetValue(Str.GetLine(1))  ! get first line (assume it is header)
  Lne.Split(',','"','"',true)
  cols = lne.records()
  loop y = 1 to cols
    fld.setValue(lne.GetLine(y))
    if capitalize
      do CapitalizeFieldName
    else
      do UnderscoreSeparateFieldName
    end
    get(Q,y)
    do AddFieldLine
  end
  out.append(trailer)

!-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

OutputAssumingNoHeaderRow ROUTINE
  attribsPtr &= Q.rest
  out.append(header)                   
  loop y = 1 to cols
    get(Q,y)
    if Q.row1.max > Q.rest.max then Q.rest.max = Q.row1.max.
    if not allString
      if Q.row1.notNumeric then Q.rest.notNumeric = true.
      if ~Q.rest.notNumeric
        if Q.row1.digitsLeft  > Q.rest.digitsLeft  then Q.rest.digitsLeft  = Q.row1.digitsLeft.
        if Q.row1.digitsRight > Q.rest.digitsRight then Q.rest.digitsRight = Q.row1.digitsRight.
        if Q.row1.isNegative then Q.rest.isNegative = true.
      end
    end
    fld.setValue('Label' & y)
    do AddFieldLine
  end
  out.append(trailer)

!-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

AddFieldLine ROUTINE
  if fld.length() < 22 then fld.setLength(22).
  if attribsPtr.max = 0
    fld.append(' STRING(1)    ! note: no data found<13,10>')
  elsif attribsPtr.notNumeric or allString
    fld.append(' STRING(' & attribsPtr.max & ')<13,10>')
  elsif attribsPtr.digitsRight or attribsPtr.DigitsLeft > 9
    decLen = attribsPtr.digitsLeft + attribsPtr.DigitsRight
    if decLen%2 = 0 then decLen += 1.     ! move up to odd length for decimals
    fld.append(' DECIMAL(' & decLen & ',' & attribsPtr.DigitsRight & ')<13,10>')
! elsif Q.DigitsLeft[grpIdx] <= 2   ! Carl notes BASIC import does not do BYTE
!   fld.append(' BYTE<13,10>')
  elsif attribsPtr.DigitsLeft <= 4   ! USHORT 65,535
    if attribsPtr.isNegative
      fld.append(' SHORT<13,10>')
    else
      fld.append(' USHORT<13,10>')
    end
  else
    if attribsPtr.isNegative
      fld.append(' LONG<13,10>')
    else
      fld.append(' ULONG<13,10>') 
    end
  end
  out.append(fld)

!-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

ReportError ROUTINE                   ! note: error is in fld
  errs.append(fld.getValue() & '<13,10>')
  fld.trace()
  stop(fld.getValue() & '<13,10>(Error creating layout from ' & clip(pFileName) & ')')

!-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

CheckNumber ROUTINE                   ! note field value is in fld
  fld.trim()                          ! remove spaces from both ends
  tmp.setValue(fld)                   ! make a temporary copy
  tmp.trim('-')                       ! remove '-' from both ends
  tmp.removeChars(',.')               ! 1,234.56 -> 123456
  if tmp._DataEnd > 31 or             | too long for number
     tmp.KeepChars('0123456789') or   | contains non-numeric chars
     fld.count('-') > 1 or            |
     fld.count('.') > 1
    attribsPtr.notNumeric = true ! invalid numeric so store as string
  else
    if fld.containsChar('-')
      attribsPtr.IsNegative = true
      fld.trim('-')
    end
    fld.replace(',','') !fld.removeChars(',')
    dot = fld.findChar('.')
    if dot
      numLeft  = dot - 1
      numRight = fld._DataEnd - Dot
    else
      numLeft = fld._DataEnd
      numRight = 0
    end
    if numLeft  > attribsPtr.digitsLeft  then attribsPtr.digitsLeft  = numLeft.
    if numRight > attribsPtr.digitsRight then attribsPtr.digitsRight = numRight.
  end

!-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
                         
CapitalizeFieldName ROUTINE  ! note field name is in fld
 data
containsLower Long,auto
containsUpper Long,auto
i             Long,auto
 code
  fld.KeepChars('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_: ',' ')
  fld.Split(' ') 
  loop i = 1 to fld.records()
    fld.setValue(fld.getLine(i))
    if fld.length() < 1 then cycle.
    containsLower = fld.containsA('abcdefghijklmnopqrstuvwxyz',fld.slice(2)) ! exclude 1st char which will be UPPERed
    containsUpper = fld.containsA('ABCDEFGHIJKLMNOPQRSTUVWXYZ')
    if containsUpper
      if containsLower
        if ~inrange(val(fld.valuePtr[1]),val('a'),val('z')) ! first char not lower?
          cycle ! do NOT change this word in field name label - take it as the mixed case given
        end
      else
        fld.lower()
      end
    end
    fld.valuePtr[1] = upper(fld.valuePtr[1])
    fld.setLine(i,fld.getValue())         ! update value in queue of words
  end
  fld.join('')  ! join the words back together to give the field name label

  if fld._DataEnd < 1
    fld.setValue('_')          ! blank field name
  else
    case val(fld.valuePtr[1])  ! first char should be alpha or underscore
    of   val('A') to val('Z')
    orof val('_')
      ! all ok
    else
      fld.prepend('_')
    end
  end

!-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

UnderscoreSeparateFieldName ROUTINE  ! note field name is in fld
  fld.KeepChars('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_:','_')
  fld.trim('_')
  loop
    if not fld.replace('__','_') then break. ! easier/faster than recursive replace option
  end

  if fld._DataEnd < 1
    fld.setValue('_')          ! blank field name
  else
    case val(fld.valuePtr[1])  ! first char should be alpha or underscore
    of   val('A') to val('Z')
    orof val('a') to val('z')
    orof val('_')
      ! all ok
    else
      fld.prepend('_')
    end
  end

!-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
1 Like

Hi Everyone,
Thank you very much for taking my query from the Clarion Newsgroup and running with it.
I’ll see if I can incorporate it into my app.

Best Regards,

Steve.

1 Like

Over on the newsgroups I described to Steve how to import the procedure into his app by creating an new source procedure and copy/pasting into the relevant embeds.

Then it occurred to me that a TXA would likely be easier. So here it is:

CreateLayoutFromCSV.txa (12.4 KB)

make sure you make a backup of your app first, then Application->Import Txa

hope that helps

cheers

Geoff R