User Tools

Site Tools


k6:k6.13:start

6.13 Class CsvFile

The class CsvFile (gb.util) can be used to read a CSV file and decode its contents. Note: It is assumed that the first line of the CSV file necessarily contains the field names. If not all field names are given, then the missing field name is replaced by a # character followed by the index of the field in the imaginary field list. In general: The CsvFile class cannot handle CSV files that do not have a (first) line with the field names (header), although the header with the field names is optional according to RFC 4180!

You can create an object of the class. It will open a CSV file for reading:

Dim hCsvFile As CsvFile
hCsvFile = New CsvFile ( Path As String [ , Separator As String, Escape As String ] )
  • Path - path of the CSV file.
  • Separator - The field separator is optional. By default it is a comma.
  • Escape - The character that encloses a field value is optional. By default, it is double quotes, such as for “FieldValue”.

6.13.1 Properties

The CsvFile class has three properties:

PropertyData typeDescription
EofBooleanTrue is returned if the end of the CSV file was reached while reading.
FieldsString[ ]Return a list of field names as string array.
LineIntegerReturns the number of the last read line.

Table 6.13.1.1 : Properties of the CsvFile class

All field names in the list of field names are normalized using the following algorithm:

  • The field name is converted to lowercase.
  • Newlines and tabs are replaced by spaces.
  • Spaces with Unicode 160 are replaced by normal spaces.
  • If a space follows another space, it is removed.
  • If a (field) name is empty, it is replaced by a # character - followed by the field index number.

6.13.2 Methods

The CsvFile class has only this one method:

Function Read() As Collection

The function reads a CSV file and returns it as a collection of field values indexed by their field names: CsvFile.Fields property. The field values are normalized using the following algorithm:

  • Spaces with Unicode 160 are replaced by normal spaces.
  • Extra spaces or control characters (ASCII code less than 32) at the beginning or end of a field value are removed.

6.13.3 Notes

You should be aware of and follow the following hints:

  • In reference to RFC 4180, a comment stated “In some CSV implementations, leading and trailing spaces and tabs are trimmed (ignored). Such trimming is prohibited by RFC 4180, which means “spaces are considered part of a field and should not be ignored.””
  • However, this clear rule is bypassed in the present CsvFile class both when extracting the field names - if there are any in the 1st line - and when reading the field values.
  • In some CSV files, missing or unknown field values are marked by “” or by NULL. But if you then read a line with the Read() method of the CsvFile class, (at least) one key-value pair is missing from the function value of the Read method returned as Collection. The explanation for this behavior is simple, because the empty string is the same as zero in Gambas and putting zero in a collection is the same as deleting the key from the collection!
  • A collection is a hash table. This means, among other things, that you must never assume a particular order of keys when iterating through a Collection with a FOR-EACH control structure. In plain language: With For-Each you get the values in the Collection a priori in arbitrary, non-deterministic order. So you must not assume that you get them in exactly the same order as they are in a row of a CSV file!

The correct conversion of the rows in a CSV file to Variant[] instead of Collection, for example, would look like this, where you get all the field names via the CsvFile.Fields property:

Private Function GetData() As Variant[]
  Dim aFieldList As Variant[]
  Dim cLine As Collection
  Dim sField As String, aFields As String[]
 
  aFieldList = New Variant[]
 
  While Not hCSVFile.Eof
    cLine = hCSVFile.Read()
    aFields = New String[]
    For Each sField In hCSVFile.Fields
      aFields.Add(cLine[sField])
    Next
    aFieldList.Add(aFields)
  Wend
 
  Return aFieldList
 
End

For example, if you want to know if the field with the field name “location” is set in the current row, then you need to use the Exist() method for Collection:

Dim cLine As Collection
...
cLine = hCSVFile.Read()
If Not cLine.Exist("ort") Then ...

Here are more hints for creating and editing CSV files - found on the website https://www.thoughtspot.com/blog/6-rules-creating-valid-csv-files - by Jon Avrach, the observance of which will minimize problems, especially when using CSV files to import data into databases:

  • Separate data fields with a field separator (separator), usually a comma. If you do not use a comma, you should choose, for example, a tab character or a pipe character or a semicolon.
  • Include a complete header with a list of field names in the first line of the file. While this is optional, it is strongly recommended because it acts like documentation of the contents of the file, since you can often infer the data type of the field values from the field names.
  • Make sure that the header is formatted in the same way as the other lines with the same field separator and the same escape character - if used.
  • Keep each record on a separate line. The rule here is that each record must start on its own line. However, a single record can span multiple lines.
  • After the last record, the last line should end without an end-of-line character.
  • Remember that the enclosing escape character (typically double quotes) must be used if, for example, the field separator is present in a field as in numbers in the (German) notation x*,y* as in 23,345.

For more details on these notes, you can get good information on Wikipedia with appropriate search criteria and in the RFC 4180 (CSV specification).

In the projects about CSV, 2 cases have been distinguished. The first case is about writing CSV files with data from different sources. The sources were data in controls such as GridView or TextArea or XML files or database tables. The sources for the second case were CSV files whose content is read and processed, saved or displayed in different ways. Only for this case, the CsvFile class from the gb.util component can be used.

In the chapter → https://www.gambas-buch.de/doku.php?id=k17:k17.7:k17.7.6:start you can find a project for exporting selected data to a CSV file and in the chapter at https://www.gambas-buch.de/doku.php?id=k17:k17.7:k17.7.7:start you can find a project for importing data from a CSV file. Interesting in both cases is the possibility to specify both the separator character and the escape character. You can also specify whether the field names should be saved as headers as well, or preview to see if a (first) line with the field names as headers exists.

B1
Figure 6.13.3.1: Data export

B2
Figure 6.13.3.2: Data import.

The archives for these two revised projects can be found in the download area.

6.13.4 Project 1 - CSV2XML

The first project first reads a CSV file in the dialog. Then the content of the CSV file is prepared in such a way that

  • blank lines are ignored,
  • field values without escape characters are 'wrapped' by the fixed escape character,
  • missing field values are replaced by an empty string or by NULL, and
  • it is checked that the last line does not have an end-of-line character '\n'.

The thus prepared content is saved under the original file name. Afterwards, the content of the prepared CSV file is decoded with a new CsvFile object, specifying the field separator and the escape character, and written to an XML document (Chapter 27.2.1). Finally, the XML document is saved. There is no check if the CSV file contains a list of field names in the first line. A header is simply expected, because otherwise the use of the class CsvFile is not possible!


Figure 6.13.4.1: Content of the prepared CSV file


Figure 6.13.4.2: Extract from the contents of the XML file.

The source code is given in full and commented:

[1] ' Gambas class file
[2]
[3] Private $sSeparator As String
[4] Private $sEscape As String
[5] Public hXmlWriter As XmlWriter
[6] Public sFilePath As String
[7] Public hCSVFile As CsvFile
[8]
[9] Public Sub Form_Open()
[10]   FMain.Center
[11]   FMain.Resizable = True
[12]   FMain.Caption = "CSV2XML | Gambas-Version: " & System.FullVersion
[13]
[14]   $sSeparator = ","
[15]   $sEscape = "\"" ' Alternative: Chr(34) => "
[16]
[17]   btnConvertAndSave.Enabled = False
[18]
[19] End ' Form_Open()
[20]
[21] Public Sub btnPrepareAndSave_Click()
[22]   sFilePath = FileOpen()
[23]   If sFilePath Then
[24]      txaPrepare.Text = Prepare(sFilePath)
[25]      txaPrepare.Pos = 0
[26]      File.Save(sFilePath, Prepare(sFilePath))
[27]      Wait
[28]      btnPrepareAndSave.Enabled = False
[29]      btnConvertAndSave.Enabled = True
[30]   Else
[31]     Return
[32]   Endif
[33] End
[34]
[35] Public Sub btnConvertAndSave_Click()
[36]   WriteXML()
[37]   btnPrepareAndSave.Enabled = True
[38]   btnConvertAndSave.Enabled = False
[39] End
[40]
[41] Public Sub btnClose_Click()
[42]   FMain.Close()
[43] End ' Close
[44]
[45] Private Function Prepare(sFilePath As String) As String
[46]
[47]   Dim sCSVContent, sCSVChanged, sLine, sField, sNewLine As String
[48]
[49]   sCSVContent = File.Load(sFilePath)
[50]
[51]   For Each sLine In Split(sCSVContent, gb.NewLine, $sEscape, False, True)
[52]     sLine = Trim(sLine)
[53]     If Not sLine Then Continue ' Blank lines are ignored
[54]     sNewLine = ""
[55]     For Each sField In Split(sLine, $sSeparator, $sEscape, False, True)
[56]       If sField Not Begins $sEscape And If sField Not Ends $sEscape Then
[57]          sField = $sEscape & sField & $sEscape
[58]       Endif
[59]       If sField = $sEscape & $sEscape Or If sField = $sEscape & " " & $sEscape Then
[60]        ' sField = $sEscape & "" & $sEscape ' Case 1
[61]          sField = "ZERO"                   ' Case 2
[62]       Endif
[63]       sNewLine &= sField & $sSeparator
[64]     Next
[65]      If sNewLine Ends $sSeparator Then sNewLine = Left(sNewLine, -1)
[66]      sCSVChanged &= sNewLine & gb.NewLine
[67]   Next
[68]
[69]   sCSVChanged = Left(sCSVChanged, -1) ' Line end character of the *last* line is removed
[70]
[71]   Return sCSVChanged
[72]
[73] End ' Prepare(...)
[74]
[75] Private Sub WriteXML()
[76]
[77]   Dim k As Integer
[78]   Dim cLine As Collection
[79]   Dim sField As String
[80]
[81]   hCSVFile = New CsvFile(sFilePath, $sSeparator, $sEscape)
[82]
[83]   hXmlWriter = New XmlWriter
[84]   hXmlWriter.Open(Zero, True, "UTF-8")
[85]
[86]   hXmlWriter.PI("Document", "version=\"" & Format(Now, "dd.mm.yyyy-hh:nn") & "\"")
[87]
[88]   hXmlWriter.StartElement("root")
[89]   hXmlWriter.Comment("DataBase: " & File.Name(sFilePath))
[90]
[91]   While Not hCSVFile.Eof
[92]     hXmlWriter.StartElement("item")
[93]     cLine = hCSVFile.Read()
[94]     k = 0
[95]     For Each sField In hCSVFile.Fields
[96]       If sField = "zero" Then
[97]          hXmlWriter.Element("field_" & CStr(k + 1), cLine[sField])
[98]       Else
[99]          hXmlWriter.Element(sField, cLine[sField])
[100]     Endif
[101]     Inc k
[102]    Next
[103]    hXmlWriter.EndElement
[104]  Wend
[105]
[106]   hXmlWriter.EndElement ' root
[107]   hXmlWriter.EndDocument
[108]   txaPrepare.Text = hXmlWriter.Data
[109]   txaPrepare.Pos = 0
[110]   File.Save(File.Dir(sFilePath) &/ File.BaseName(sFilePath) & ".xml", hXmlWriter.Data)
[111]
[112] End
[113]
[114] Private Function FileOpen() As String
[115]   Dialog.Title = "Import a csv file!"
[116]   Dialog.Filter = ["*.csv", "csv-Files"]
[117]   Dialog.Path = Application.Path &/ "CSV"
[118]   If Dialog.OpenFile(False) = True Then ' Multiselect=False (Standard)
[119]      Message.Info("The opening of the csv file was cancelled!")
[120]      Return
[121]   Else
[122]      Return Dialog.Path
[123]   Endif
[124] End ' FileOpen()

Comment:

  • In lines 45 to 73, the content of the CSV file - opened in line 49 - is prepared according to the above specifications.
  • To do this, first a line is read in line 51 and then in line 55 each field in the current line is examined and then in line 63 a new line is created field by field.
  • The changed content of the CSV file is displayed in a TextArea in line 24 and saved in a CSV file with original path in line 26.
  • In lines 75 to 112 the XML file is written. The XmlWriter class is used → Chapter 27.2.0. The main load is carried by the While control structure in lines 91 to 104. As you can see, almost all the properties and the Read() method of the CsvFile class are used. Note that the Read() method returns as a function value a collection containing all key-value pairs of a row.
  • The control structure in lines 96 to 100 creates pseudo field names for non-existent field names in the header, consisting of a # character followed by a field index.

6.13.5 Digression - CSV2XML

In this digression, you will be presented with a (bash) script that likewise allows you to quickly transform the contents of a CSV file into the contents of an XML file. The executable script expects exactly 3 parameters:

→ the file name of the CSV file,→ the field separator, and the→ escape character:

hans@mint71 ~/GB3BUCH/.../Projekte/CSV_READ/SH-Skript $ ./test.sh "db.csv" "," "\""

If the number of parameters is not 3, then there is an error message with a hint about the correct syntax:

Syntax: ./test.sh 'CSV-File-Path' 'Field separator' 'Escape character'
# hans@mint71 ~/GB3BUCH/.../Projekte/CSV_READ/SH-Skript $ ./test.sh "db.csv" ","

If this is the content of the CSV file:

"First name","Age","Residence"
"Hans","68","Pößneck"
"Peter","56","Cork"
"Anna","18","Jena"

then here you can see the result in the XML file:

<?xml version="1.0" encoding="UTF-8"?>
<Data>
  <Record>
    <First name>Hans</First name>
    <Age>68</Age>
    <Residence>Pößneck</Residence>
  </record>
  <Record>
    <First name>Peter</First name>
    <age>56</age>
    <Residence>Cork</Residence>
  </record>
  <Record>
    <FirstName>Anna</FirstName>
    <Age>18</Age>
    <Residence>Jena</Residence>
  </Record>
</Data>

The tested source code from the (bash) script is well documented and is given in full:

#!/bin/bash # She-Bang Line → Bash interpreter
 
  if [ $# -ne 3 ] # If the number of parameters (= $#) is not equal to 3 ...
  then
    # FARB-WERTE:
    RO="\033[31m"    # red
    NO="\033[0m"     # regular
    echo -e "${RO}Syntax: $0 'CSV-File-Path' 'Field separator' 'Escape character'"
    exit 1
  fi
  # Saving the variable IFS in the variable backIFS | IFS = Internal Field Separator
  backIFS=$IFS
 
  file_in=$1 # Parameter 1
  echo $file_in
  # Filename for file_out (XML-File) generate
  filebasename=${file_in%.*}
  fileextension=".xml"
  file_out=$filebasename$fileextension
  echo $file_out
 
  separator=$2 # Parameter 2
  escape=$3    # Parameter 3
 
  # 1. Read line (= list of field names) from CSV file
  read header_line < "$file_in"
  IFS=$separator
  # store list of field names in array 'header' -> operator =()
  header=($header_line)
 
  IFS='*'
 
  # Read in the content of the CSV file line by line and save it in the array 'content'.
  i=0
  while read content[$i]
  do
    # echo "Line $i: ${content[i]}"
    ((i++))
  done < $1
 
  # 1. Delete row in array 'content
  unset content[0]
  # copy array - but without the now empty first element
  content=(${content[*]})
 
  # write content of the XML file
  #-------------------------------------------------------------------------------
  # XML-Prolog
  echo '<?xml version="1.0" encoding="UTF-8"?>' > $file_out
  # XML-Root-Element
  echo '<Data>' >> $file_out
  # XML-Elemente
  #-------------------------------------------------------------------------------
  for record in "${content[@]}"
  do
  echo '  <Record>' >> $file_out
    index=0
    #.............................................................................
 
    IFS=$separator list=($record)
 
    for (( c=0; c<=${#header[@]}-1; c++ ))
    do
      tag=${header[$c]#$escape}
      tag=${tag%$escape}
      value=${list[$c]#$escape}
      value=${value%$escape}
      echo '    <'${tag}'>'${value}'</'${tag}'>' >> $file_out
    done
 
    #.............................................................................
    echo '  </Record>' >> $file_out
    ((index++))
  done
  #-------------------------------------------------------------------------------
  echo '</Data>' >> $file_out
 
  IFS=$backIFS
 
  exit 0

6.13.6 Project 2 - CSV2GRID

In the second project, a CSV file is read. The data decoded with the CsvFile read method is displayed in a GridView.

Source code:

' Gambas class file
 
Public Sub btnReadCSV_Click()
 
  Dim i, iVal As Integer
  Dim sField As New String[]
  Dim sKey, sSeparator, sEscape As String
  Dim hCsvFile As CsvFile
  Dim cLine As Collection
  sSeparator = ","
  sEscape = "\"" ' Chr$(34)
' Open CSV file
  Try hCsvFile = New CsvFile(FileOpen(), sSeparator, sEscape)
  If Error Then Return
  GridView1.Columns.Count = 0
' Prepare header
  GridView1.Header = True
  sField = hCsvFile.Fields
  GridView1.Columns.Count = sField.Count
  For i = 0 To sField.Count - 1
    GridView1.Columns[i].Title = UCase(sField[i])
  Next
' Read CSV data
  While Not hCsvFile.Eof
    cLine = hCsvFile.Read()
    If cLine.Count > 0 Then ' Collection not empty?
       Inc GridView1.Rows.Count
       iVal = 0
       For Each sKey In hCsvFile.Fields
         Gridview1[GridView1.Rows.Count - 1, iVal].text = cLine[sKey]
         Inc iVal
       Next
    Endif
  Wend
 GridView1.MoveTo(0, 0)
 GridView1.Columns.Width = -1 ' Adjust column width automatically
End
 
Private Function FileOpen() As String
 
  Dialog.Title = ("Import csv-File!")
  Dialog.Filter = ["*.csv", "csv-Files"]
  Dialog.Path = Application.Path &/ "Files"
  If Dialog.OpenFile(False) = True Then ' Multiselect=False (Standard)
     Message.Info(("Opening the csv file has been canceled!"))
     Return "break"
  Else
     Return Dialog.Path
  Endif
 
End ' FileOpen()

B5
Figure 6.13.6.1: Data from a CSV file - displayed in a GridView.

6.13.7 Project 3 - QSLITE2CSV

Project 3 is dedicated to the case where data from a database table is stored in a CSV file. Note: In the case of database tables, MyQSL, PostgreSQL or SQlite provide dump or copy methods that can be used to save data directly to a CSV file. Therefore, only the method in which the selected data from a DB table is converted to a string representing the contents of a CSV file is presented:

Private Function GetCSVData() As String
 
  Dim sCSVLine, sRequest, sSeparator, sEscape As String
  Dim hRequest As SQLRequest
  Dim dbResult As Result
  Dim rField As ResultField
 
  hRequest = hConnection.SQL.Select("").From(cmbDBTabellen.Text)
  sRequest = hRequest.Get()
  dbResult = hConnection.Exec(sRequest)
 
  If dbResult.Count = 0 Then
     Message(("The number of selected records is zero!"))
     Return
  Endif
 
  FMain.Caption = "Data export from database: '"
  FMain.Caption &= File.Name(sDBFilePath) & "' "
  FMain.Caption &= " Table: '" & cmbDBTables.Text
  FMain.Caption &= "' in CSV file"
 
  sSeparator = ","  ' Field separator
  sEscape = "\""    ' Text separator
 
' Field list
  For Each rField In dbResult.Fields
    sCSVLine &= sEscape & Upper(rField.Name) & sEscape & sSeparator
  Next
  sCSVLine = Left(sCSVLine, -1) ' Last field name WITHOUT field separator
  sCSVLine &= gb.NewLine
 
' Data lines
  For Each dbResult
    For Each rField In dbResult.Fields
      If dbResult[rField.Name] = "" Then
         sCSVLine &= sEscape & "NULL" & sEscape & sSeparator
      Else
         sCSVLine &= sEscape & dbResult[rField.Name] & sEscape & sSeparator
      Endif
    Next
    sCSVLine = Left(sCSVLine, -1) ' Last field name WITHOUT field separator
    sCSVLine &= gb.NewLine
  Next
 
  sCSVLine = Left(sCSVLine, -1) ' Last line WITHOUT gb.NewLine
  Return sCSVLine
 
End

B6
Figure 6.14.7.1: Data Export DB Data to CSV File

In the TextArea you can see an excerpt from the content of the CSV file.

Hint:
In the chapter https://www.gambas-buch.de/doku.php?id=k27:k27.6:start#projekt_2xml_csv a project is presented, where the transformation XML → CSV is implemented.

Download

The website uses a temporary session cookie. This technically necessary cookie is deleted when the browser is closed. You can find information on cookies in our privacy policy.
k6/k6.13/start.txt · Last modified: 29.01.2022 (external edit)

Page Tools