2 Replies - 15846 Views - Last Post: 22 April 2007 - 06:23 PM Rate Topic: -----

#1 rhett.moeller   User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 34
  • Joined: 13-May 06

Trouble with String.Split

Posted 21 April 2007 - 05:41 AM

I'm working on a personal project in C# that takes a line from a text file and stores the words individually into a database for later review/study, but I've run into a snag. Can anyone provide some help?

Here's what I have working so far:

1) I read in a text file, one line at a time.

2) I break a given line into component words. For instance, if the stream reads, "Hi, my name is Fred!", the string is split into Hi,/my/name/is/Fred!. So far, so good.

Here's where I need help:

3) This is where my lack of understanding kicks in. I want to further test each individual word in order to store punctuation in a separate variable. For instance, once this pass is done, I should have an ID, a word, and punctuation (if any) for each word:

"1", "Hi", ","
"2", "my", ""
"3", "name", ""
"4", "is", ""
"5", "Fred", "!"

What I'm not understanding is how to review the individual characters of each word in the array, testing to see whether the character is a letter or punctuation. Has anyone here had to do this before? I'm assuming there's another Split involved, but I'm not sure how to implement it.

Here's the code I'm working with right now:

public static void ParseSourceFile(StreamReader sourceFile, string fullTitle, string shortTitle)
{
// Variable declaration area
string line;
char [] lineDelimiters = {' ', '\t'};
int lineCounter = 0;
int wordPosition = 0;
string recordID;

try
{
	// Open the file for reading
	line = sourceFile.ReadLine();
	
	// Test to see if the title is equivalent to the first line of the poem; if so, read in the next line
	if (line == fullTitle)
	{
		line = sourceFile.ReadLine();
	}

	while (line != null)
	{
		// Increment the line counter
		lineCounter++;

		string [] words = line.Split(lineDelimiters);
					
		foreach (string s in words)
		{
			wordPosition++;

			 // ************************************************
			 // This is the trouble area: inspect words individually
			string [] letter = s.Split('a'. . . 'Z');

												 foreach (string i in letter)
			{
				//Check for letter or punctuation
			}
			 // ************************************************

			// Simulate sending to a database
			Console.WriteLine(s + " " + shortTitle + lineCounter + wordPosition);
		}

		// Read the next line
		line = sourceFile.ReadLine();
	}
	sourceFile.Close();
	Console.ReadLine();
}


Is This A Good Question/Topic? 0
  • +

Replies To: Trouble with String.Split

#2 dkirkland   User is offline

  • New D.I.C Head
  • member icon

Reputation: 0
  • View blog
  • Posts: 17
  • Joined: 13-April 07

Re: Trouble with String.Split

Posted 22 April 2007 - 12:48 PM

View Postrhett.moeller, on 21 Apr, 2007 - 05:41 AM, said:

3) This is where my lack of understanding kicks in. I want to further test each individual word in order to store punctuation in a separate variable. For instance, once this pass is done, I should have an ID, a word, and punctuation (if any) for each word:

"1", "Hi", ","
"2", "my", ""
"3", "name", ""
"4", "is", ""
"5", "Fred", "!"

What I'm not understanding is how to review the individual characters of each word in the array, testing to see whether the character is a letter or punctuation. Has anyone here had to do this before? I'm assuming there's another Split involved, but I'm not sure how to implement it.

Hi rhett,

As far as my understanding of splitting strings goes, the whole excersise is not gonna be as simple as using a .NET BCL routing to split the text in the desired way. You need to split up the text and then check for punctuation.

I have included a quick solution below. If anyone knows a nicer way of doing this then please tell! It's always good to learn something new :)

using System;
using System.Collections.Generic;

namespace DreamInCode.Help
{
	public class LanguageParser
	{
		public struct LangElement
		{
			private int _id;
			public int Id
			{
				get { return _id; }
				set { _id = value; }
			}

			private string _word;
			public string Word
			{
				get { return _word; }
				set { _word = value; }
			}

			private string _punctuation;
			public string Punctuation
			{
				get { return _punctuation; }
				set { _punctuation = value; }
			}

			public LangElement(int id, string word, string punctuation)
			{
				_id = id;
				_word = word;
				_punctuation = punctuation;
			}
		}

		public List<LangElement> ParseText(string text)
		{
			text = text.Replace("\r\n", "\n");
			char[] delimiters = new char[] { ' ', '\n', '\r', '\t' };
			string[] punctuation = new string[] { ",", ".", ";", ":", "!" };
			List<LangElement> elementList = new List<LangElement>();
			string[] words = text.Split(delimiters, StringSplitOptions.RemoveEmptyEntries);
			int wordId = 0;
			for (int w = 0; w < words.Length; w++)
			{
				string word = words[w];
				wordId++;
				string punct = "";
				foreach (string p in punctuation)
				{
					if (word.Contains(p))
					{
						int i = word.IndexOf(p);
						punct = word.Substring(i);
						word = word.Substring(0, i);
						break;
					}
				}
				elementList.Add(new LangElement(wordId, word, punct));
			}
			return elementList;
		}

		public static void TestParser()
		{
			string test = "Hi, my name is Fred!";
			LanguageParser lp = new LanguageParser();
			lp.ParseText(test);
		}
	}
}



Hope this points you in the right direction....
Was This Post Helpful? 0
  • +
  • -

#3 rhett.moeller   User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 34
  • Joined: 13-May 06

Re: Trouble with String.Split

Posted 22 April 2007 - 06:23 PM

:D

I appreciate the tip-- I'll try it out and let you know how it goes, hopefully tomorrow. It's been frustrating trying to get this going, and I thank you in advance for your solution!

Rhett
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1