9 Replies - 19222 Views - Last Post: 30 December 2013 - 07:53 AM Rate Topic: -----

#1 Feldar  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 7
  • Joined: 03-November 09

VB.Net Splitting a csv line with commas and quotes?

Posted 11 March 2010 - 03:27 PM

I have a comma separated file that has commas in the data, so using Split() doesn't work. I tried to use Jet 4.0, but a column has text and numeric data in it, so half of the values return null. After researching for hours and getting nowhere I attempted to write my own split function that checks for quotes so it can skip past the extra commas in the line. I fiddled with it for a few hours and got it almost running. There's only one problem left.... Here's the contents of my test .csv (I'll attach it)
Status,ErrorMessage,CustomLabel
"Ok","""Stuff does things""",123
,none,234
ok,"""Stuff""","This field, contains, commas "" and quotes"
"""3""","",endoffile

here's my string splitting function:
    Function ParseCSV(ByVal str As String) As String()
        Dim alStrings As New ArrayList
        Dim intInStr1 As Integer = 0
        Dim intInStr2 As Integer = 0
        'if 1st field starts with quotes
        If str.Substring(0, 1) = """" Then
            'find the next set of quotes (offset by 1 to account for the first quote)
            intInStr2 = InStr(2, str, """")
            'if the quotes are doubled ignore them (the quotes are part of the text)
            While str.Substring(intInStr2, 1) = """"
                intInStr2 = InStr(intInStr2 + 2, str, """")
            End While
            'if we've reached the end of the line...
            If intInStr2 = 0 Then
                alStrings.Add(str.Substring(1))
            Else
                alStrings.Add(str.Substring(1, intInStr2 - 2)) 'Trims the leading and trailing quotes
            End If
            'after the end quotes find the next comma
            intInStr1 = InStr(intInStr2, str, ",")
        Else
            'no quotes 1st comma starts the next field
            intInStr2 = InStr(str, ",")
            'if we've reached the end of the line...
            If intInStr2 = 0 Then
                alStrings.Add(str)
            Else
                alStrings.Add(str.Substring(0, intInStr2 - 1))
            End If
            intInStr1 = intInStr2
        End If
        While intInStr1 > 0
            'if next field starts with quotes
            If str.Substring(intInStr1, 1) = """" Then
                'find the next set of quotes (offset by 1 to account for the first quote)
                intInStr2 = InStr(intInStr1 + 2, str, """")
                'if the quotes are doubled ignore them (the quotes are part of the text)
                While str.Substring(intInStr2, 1) = """"
                    intInStr2 = InStr(intInStr2 + 2, str, """")
                End While
                'if we've reached the end of the line...
                If intInStr2 = 0 Then
                    alStrings.Add(str.Substring(intInStr1 + 1))
                Else
                    alStrings.Add(str.Substring(intInStr1 + 1, intInStr2 - intInStr1 - 2)) 'Trims the leading and trailing quotes
                End If
                'after the end quotes find the next comma
                intInStr1 = InStr(intInStr2, str, ",")
            Else
                'no quotes next comma starts the next field
                intInStr1 = intInStr1 'so the leading comma doesn't get included

                intInStr2 = InStr(intInStr1 + 1, str, ",")
                'if we've reached the end of the line...
                If intInStr2 = 0 Then
                    alStrings.Add(str.Substring(intInStr1))
                Else
                    alStrings.Add(str.Substring(intInStr1, intInStr2 - intInStr1 - 1))
                End If
                intInStr1 = intInStr2
            End If
        End While
        Return DirectCast(alStrings.ToArray(GetType(String)), String())
    End Function



The line:
While str.Substring(intInStr2, 1) = """"
returns an error when it reaches the end of the string. Any suggestions on how to fix up my code would be greatly appreciated.

Attached File(s)



Is This A Good Question/Topic? 0
  • +

Replies To: VB.Net Splitting a csv line with commas and quotes?

#2 AdamSpeight2008  Icon User is offline

  • MrCupOfT
  • member icon


Reputation: 2263
  • View blog
  • Posts: 9,467
  • Joined: 29-May 08

Re: VB.Net Splitting a csv line with commas and quotes?

Posted 11 March 2010 - 04:07 PM

Would it be simple to use the TextfieldParser Tutorial.
Was This Post Helpful? 0
  • +
  • -

#3 Feldar  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 7
  • Joined: 03-November 09

Re: VB.Net Splitting a csv line with commas and quotes?

Posted 11 March 2010 - 05:01 PM

View PostAdamSpeight2008, on 11 March 2010 - 03:07 PM, said:

Would it be simple to use the TextfieldParser Tutorial.


I'm going to have to say no. The file I will be reading is by no means "Fixed width" nor do I know the number of columns until the file is actually read...

Do note that this function will be called similarly to Split() as each line of the file is read into a string, you then call the function to split that string into an array of varying size. I suppose I should have included the code for that in my original post. Sorry.

Dim TempStr as String
Dim intStatusCol as Integer = -1
Dim intErrorCol as Integer = -1
Dim intCustomLabel as Integer = -1

FileOpen(1, "TestFile.csv", OpenMode.Input)

'Read the header row and get column numbers for the columns you want to read'
TempStr = LineInput(1)
Dim arLine1() as String = Split(TempStr, ",") '<- Here uses Split'
intStatusCol = Array.IndexOf(arLine1, "Status")
intErrorCol = Array.IndexOf(arLine1, "ErrorMessage")
intCustomLabel = Array.IndexOf(arLine1, "CustomLabel")

'Check to make sure all columns are present'
If intStatusCol > -1 and intErrorCol > -1 and intCustomLabel > -1 Then
Do While Not EOF(1)
TempStr = LineInput(1)
Dim arLine2() as String = ParseCSV(TempStr) '<- Here uses my ParseCSV function'
'Do what you want with the info from the selected columns'
MessageBox.Show(arLine2(intStatusCol) & "::" & arLine2(intErrorCol), "Box Header", MessageBox.Buttons.OK)
Loop
End If
FileClose(1)


This post has been edited by Feldar: 11 March 2010 - 05:22 PM

Was This Post Helpful? 0
  • +
  • -

#4 AdamSpeight2008  Icon User is offline

  • MrCupOfT
  • member icon


Reputation: 2263
  • View blog
  • Posts: 9,467
  • Joined: 29-May 08

Re: VB.Net Splitting a csv line with commas and quotes?

Posted 11 March 2010 - 05:14 PM

You haven't read the tutorial properly. Look Again.
Was This Post Helpful? 0
  • +
  • -

#5 Feldar  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 7
  • Joined: 03-November 09

Re: VB.Net Splitting a csv line with commas and quotes?

Posted 12 March 2010 - 05:56 AM

View PostAdamSpeight2008, on 11 March 2010 - 04:14 PM, said:

You haven't read the tutorial properly. Look Again.

It's a nice example of how to read either a file with fixed length fields or character separated fields into a pre-made class.

While a class is nice to have, why would I need to make one if a simple string array will do?
And the code for Fixed Length fields and Character Separated is the same
Private Sub From_FixedLengthFile() 
  ' Define a new TextFieldParser, passing it our file' 
  Dim tfp As New FileIO.TextFieldParser(My.Application.Info.DirectoryPath & "\Files\TestFile01.txt") 
  ' Tell the TextFieldParse to expect a FixedWidth file (By Default it Delimited)' 
  tfp.TextFieldType = FileIO.FieldType.FixedWidth 
  ' Define an array containing the length of each field, in the order they appear'
  Dim FieldLengths() As Integer = {4, 20, 8} 
  ' Define an array containing the token to indicate what follows next (to the line end) is a comment '
  Dim Comments() As String = {"'"} 
  ' Tell our TextFieldParser what the parameters are. '
  tfp.FieldWidths = FieldLengths 
  tfp.CommentTokens = Comments 
 ' Empty the collection of people '
  A_List.Clear() 
  Dim p As person 
  ' Read the file in Try-Catch Block in case of file problems '
  Try 
   ' Have we read to the end of the file? '
   While tfp.EndOfData = False 
        ' Get the fields of the current line, it automatically moves to the next position. '
        Dim FoundField() As String = tfp.ReadFields 
        ' Does it contain the expected number of fields? '
        If FoundField.Count <> 3 Then 
         ' Nope, Display a message. '
         MessageBox.Show(String.Format("Incorrect Number field on line {0}", tfp.LineNumber), "", MessageBoxButtons.OK) 
        Else 
         ' Yes, Create a new person object and fill in the details '
         p = New person 
         p.ID = FoundField(0) 
         p.Name = FoundField(1) 
         p.DOB = ToDate(FoundField(2)) 
         ' Add the person to our collection of people. 
         A_List.Add(p) 
        End If 
   End While 
  Catch ex As Exception 
   Throw ex 
  End Try 
  ' Remember to close the parser. '
  tfp.Close() 
 End Sub

So you'll have to forgive me for assuming that this example only reads fixed length files. I do have 1 question. How does that tutorial code handle a field of "John, Doe" or "John D""oe" ?

Now it's your turn. Did you really read my question?

This post has been edited by Feldar: 12 March 2010 - 08:57 AM

Was This Post Helpful? 0
  • +
  • -

#6 AdamSpeight2008  Icon User is offline

  • MrCupOfT
  • member icon


Reputation: 2263
  • View blog
  • Posts: 9,467
  • Joined: 29-May 08

Re: VB.Net Splitting a csv line with commas and quotes?

Posted 12 March 2010 - 11:56 AM

Sorry about that, that's really embarrassing for me. I'll have to find the tutorial backup and edit it.
After I done it you'll see the reason why suggest it.
Was This Post Helpful? 0
  • +
  • -

#7 Feldar  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 7
  • Joined: 03-November 09

Re: VB.Net Splitting a csv line with commas and quotes?

Posted 15 March 2010 - 11:35 AM

I got my code working now. It's not pretty, but it does what it's supposed to. I call the function like split() and it returns a string array from the comma separated line.
Dim arLine1() As String = ParseCSV(strTemp)


The working code for the function is
    Function ParseCSV(ByVal str As String) As String()
        Dim alStrings As New ArrayList
        Dim intInStr1 As Integer = 0
        Dim intInStr2 As Integer = 0

        'See if the string is empty'
        If str.Length = 0 Then
            alStrings.Add("")
        Else
            'if 1st field starts with quotes'
            If str.Substring(0, 1) = """" Then
                'find the next set of quotes (offset by 1 to account for the first quote)'
                intInStr2 = InStr(2, str, """")
                'if the next set of quotes are not the end of the line'
                If intInStr2 < str.Length Then
                    'if the quotes are doubled ignore them (the quotes are part of the text)'
                    While str.Substring(intInStr2, 1) = """"
                        intInStr2 = InStr(intInStr2 + 2, str, """")
                        'if the end quotes are the last character in the line'
                        If intInStr2 = str.Length Then
                            'exit the loop so it doesn't return an error trying to read the next char
                            Exit While
                        End If
                    End While
                End If
                'if we've reached the end of the line...
                If intInStr2 = 0 Then
                    'the whole line is one field'
                    alStrings.Add(str.Substring(1))
                    'set intInStr2 to the line length so intInStr1 becomes 0 and doesn't infinitely loop
                    intInStr2 = str.Length
                Else
                    alStrings.Add(str.Substring(1, intInStr2 - 2)) 'Trims the leading and trailing quotes'
                End If
                'after the end quotes find the next comma'
                intInStr1 = InStr(intInStr2, str, ",")
            Else
                'no quotes 1st comma starts the next field'
                intInStr2 = InStr(str, ",")
                'if we've reached the end of the line...
                If intInStr2 = 0 Then
                    'the whole line is one field'
                    alStrings.Add(str)
                Else
                    alStrings.Add(str.Substring(0, intInStr2 - 1))
                End If
                'intInStr2 already found the next comma'
                intInStr1 = intInStr2
            End If
            While intInStr1 > 0
                'if the next field is the end of the line (last entry is blank)'
                If intInStr1 = str.Length Then
                    'then add the empty field to the array and set intInstr1 to 0 to exit the loop'
                    alStrings.Add("")
                    intInStr1 = 0
                Else
                    'if next field starts with quotes'
                    If str.Substring(intInStr1, 1) = """" Then
                        'find the next set of quotes (offset by 1 to account for the first quote)'
                        intInStr2 = InStr(intInStr1 + 2, str, """")
                        'if the quotes are doubled ignore them (the quotes are part of the text)'
                        While str.Substring(intInStr2, 1) = """"
                            intInStr2 = InStr(intInStr2 + 2, str, """")
                            'if the end quotes are the last character in the line'
                            If intInStr2 = str.Length Then
                                'exit the loop so it doesn't return an error trying to read the next char
                                Exit While
                            End If
                        End While
                        'if we've reached the end of the line...
                        If intInStr2 = 0 Then
                            alStrings.Add(str.Substring(intInStr1 + 1))
                        Else
                            alStrings.Add(str.Substring(intInStr1 + 1, intInStr2 - intInStr1 - 2)) 'Trims the leading and trailing quotes'
                        End If
                        'after the end quotes find the next comma'
                        intInStr1 = InStr(intInStr2, str, ",")
                    Else
                        'no quotes. next comma starts the next field'
                        intInStr1 = intInStr1 'so the leading comma doesn't get included

                        intInStr2 = InStr(intInStr1 + 1, str, ",")
                        'if we've reached the end of the line...
                        If intInStr2 = 0 Then
                            alStrings.Add(str.Substring(intInStr1))
                        Else
                            alStrings.Add(str.Substring(intInStr1, intInStr2 - intInStr1 - 1))
                        End If
                        intInStr1 = intInStr2
                    End If
                End If
            End While
        End If
        Return DirectCast(alStrings.ToArray(GetType(String)), String())
    End Function



Right now the function works for comma separated lines specifically. I'm sure it wouldn't be too hard to alter it for another delimiter. I tested it with a csv containing theese lines:
Status,ErrorMessage,CustomLabel
"Ok","""Stuff does things""",123
,none,234
ok,"""Stuff""","This field, contains, commas "" and quotes"
""
,

"""3""","",endoffile


Hope this helps someone else too.
Was This Post Helpful? 0
  • +
  • -

#8 Guest_Rjam*


Reputation:

Re: VB.Net Splitting a csv line with commas and quotes?

Posted 06 July 2010 - 02:01 PM

Actually, why not just open the csv in excel and do a find/replace? you can replace all commas with, let's say dash(-) or space or a blank or whatever your heart desires. Then save the file. You now no longer have a comma in the field you are reading thus no need for super long parsing programs that may or may not work in all cases. I am fairly certain in most cases during daily business data dumps this would be acceptable, especially since a dash is a lot easier to see then a comma (^_^).

Of course scripting this find/replace step within your project would be ace. I am working on that now. I've never coded this step before but it must be possible. I will find out (^_^).

Rjam of rjamsworld.com
Was This Post Helpful? 0

#9 ullfindsmit  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 1
  • Joined: 29-December 13

Re: VB.Net Splitting a csv line with commas and quotes?

Posted 29 December 2013 - 03:01 PM

There was a bug in your code on the while loop

						While String2Parse.Substring(intInStr2, 1) = Qualifier
							intInStr2 = InStr(intInStr2 + 2, String2Parse, Qualifier)
							'if the end quotes are the last character in the line'
							If intInStr2 = String2Parse.Length Then
								'exit the loop so it doesn't return an error trying to read the next char
								Exit While
							End If
						End While


I added ' intInStr2 < String2Parse.Length AndAlso ' before getting the substring to avoid error. This happened when you had a qualifier on the last column's data




	Function ParseDelimitedStringWithQualifier(String2Parse As String, Qualifier As String, Delimiter As String) As String()
		Dim alStrings As New list(Of String)
		Dim intInStr1 As Integer = 0
		Dim intInStr2 As Integer = 0

		'See if the string is empty'
		If String2Parse.Length = 0 Then
			alStrings.Add("")
		Else
			'if 1st field starts with quotes'
			If String2Parse.Substring(0, 1) = Qualifier Then
				'find the next set of quotes (offset by 1 to account for the first quote)'
				intInStr2 = InStr(2, String2Parse, Qualifier)
				'if the next set of quotes are not the end of the line'
				If intInStr2 < String2Parse.Length Then
					'if the quotes are doubled ignore them (the quotes are part of the text)'
					While String2Parse.Substring(intInStr2, 1) = Qualifier
						intInStr2 = InStr(intInStr2 + 2, String2Parse, Qualifier)
						'if the end quotes are the last character in the line'
						If intInStr2 = String2Parse.Length Then
							'exit the loop so it doesn't return an error trying to read the next char
							Exit While
						End If
					End While
				End If
				'if we've reached the end of the line...
				If intInStr2 = 0 Then
					'the whole line is one field'
					alStrings.Add(String2Parse.Substring(1))
					'set intInStr2 to the line length so intInStr1 becomes 0 and doesn't infinitely loop
					intInStr2 = String2Parse.Length
				Else
					alStrings.Add(String2Parse.Substring(1, intInStr2 - 2))	'Trims the leading and trailing quotes'
				End If
				'after the end quotes find the next comma'
				intInStr1 = InStr(intInStr2, String2Parse, Delimiter)
			Else
				'no quotes 1st comma starts the next field'
				intInStr2 = InStr(String2Parse, Delimiter)
				'if we've reached the end of the line...
				If intInStr2 = 0 Then
					'the whole line is one field'
					alStrings.Add(String2Parse)
				Else
					alStrings.Add(String2Parse.Substring(0, intInStr2 - 1))
				End If
				'intInStr2 already found the next comma'
				intInStr1 = intInStr2
			End If
			While intInStr1 > 0
				'if the next field is the end of the line (last entry is blank)'
				If intInStr1 = String2Parse.Length Then
					'then add the empty field to the array and set intInstr1 to 0 to exit the loop'
					alStrings.Add("")
					intInStr1 = 0
				Else
					'if next field starts with quotes'
					If String2Parse.Substring(intInStr1, 1) = Qualifier Then
						'find the next set of quotes (offset by 1 to account for the first quote)'
						intInStr2 = InStr(intInStr1 + 2, String2Parse, Qualifier)
						'if the quotes are doubled ignore them (the quotes are part of the text)'
						While intInStr2 < String2Parse.Length AndAlso String2Parse.Substring(intInStr2, 1) = Qualifier
							intInStr2 = InStr(intInStr2 + 2, String2Parse, Qualifier)
							'if the end quotes are the last character in the line'
							If intInStr2 = String2Parse.Length Then
								'exit the loop so it doesn't return an error trying to read the next char
								Exit While
							End If
						End While
						'if we've reached the end of the line...
						If intInStr2 = 0 Then
							alStrings.Add(String2Parse.Substring(intInStr1 + 1))
						Else
							alStrings.Add(String2Parse.Substring(intInStr1 + 1, intInStr2 - intInStr1 - 2))	'Trims the leading and trailing quotes'
						End If
						'after the end quotes find the next comma'
						intInStr1 = InStr(intInStr2, String2Parse, Delimiter)
					Else
						'no quotes. next comma starts the next field'
						intInStr1 = intInStr1 'so the leading comma doesn't get included

						intInStr2 = InStr(intInStr1 + 1, String2Parse, Delimiter)
						'if we've reached the end of the line...
						If intInStr2 = 0 Then
							alStrings.Add(String2Parse.Substring(intInStr1))
						Else
							alStrings.Add(String2Parse.Substring(intInStr1, intInStr2 - intInStr1 - 1))
						End If
						intInStr1 = intInStr2
					End If
				End If
			End While
		End If
		Return alStrings.ToArray
	End Function


Was This Post Helpful? 0
  • +
  • -

#10 ybadragon  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 204
  • View blog
  • Posts: 1,263
  • Joined: 11-May 12

Re: VB.Net Splitting a csv line with commas and quotes?

Posted 30 December 2013 - 07:53 AM

Another option is to read the line into memory and replace all the commas that aren't inside quotes with |.

How are you receiving this data? Is it from a client? Can you get them to send it in a more standardized fashion with an actual delimiter, and not just whatever fields they want however they want?
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1