10 Replies - 953 Views - Last Post: 05 September 2012 - 09:18 AM Rate Topic: -----

#1 WickedBetrayal  Icon User is offline

  • D.I.C Head

Reputation: 0
  • View blog
  • Posts: 84
  • Joined: 16-February 11

Exclude Everything But Letters In Fstream Input (Using >>)

Posted 04 September 2012 - 08:45 AM

I need to input words from a text file but I need to have only alpha characters.

I need to exclude everything but 'a'-'z'.


Like:

dataFile.open(fileName, ios::in);
dataFile >> word;




It imports each word until it hits a space character. How can I get it to stop importing when it hits spaces, underscores, periods, commas, etc..?

Thanks!

PS: Can you please not delete it this time? I want some help, not someone to write code for me. How did I ask for that? I have a whole program written and I need this to finalize it. It seems very simple.

EDIT:

Some additional code:

vector<WordFrequency> myVector;
	fstream dataFile;
	string fileName;
	cout << "Enter the file name: ";
	cin >> fileName;
	dataFile.open(fileName, ios::in);
	if (!dataFile)
	{
		cout << "ERROR: Cannot open file.\n";
		return 0;
	}

	string word;
	int frequency;

	while (true)
	{
		dataFile >> word;
		frequency=1;
		myVector.push_back(WordFrequency(1,word));
		if (dataFile.fail())
			break;
	}



I am doing the same assignment as another post in this forum. But I went about it different and it worked out to be easier (maybe not more efficient, but more on my level of experience). Basically I need to find the top 50 most common words. Words, not commas etc.. That is why I wish to exclude them to have a more working program.

This post has been edited by WickedBetrayal: 04 September 2012 - 10:47 AM


Is This A Good Question/Topic? 0
  • +

Replies To: Exclude Everything But Letters In Fstream Input (Using >>)

#2 jimblumberg  Icon User is offline

  • member icon


Reputation: 4060
  • View blog
  • Posts: 12,534
  • Joined: 25-December 09

Re: Exclude Everything But Letters In Fstream Input (Using >>)

Posted 04 September 2012 - 08:56 AM

You need to show what you tried. But I recommend using getline() to retrieve the entire line into a std::string, then either remove the undesired characters or change them to spaces, using the string functions. Then further process this string with a stringstream.


Jim
Was This Post Helpful? 0
  • +
  • -

#3 WickedBetrayal  Icon User is offline

  • D.I.C Head

Reputation: 0
  • View blog
  • Posts: 84
  • Joined: 16-February 11

Re: Exclude Everything But Letters In Fstream Input (Using >>)

Posted 04 September 2012 - 10:44 AM

Well while doodling in class just now I came up with a few for loops that I think I can use to remove anything but a-z.

Basically I will have each string "word" and then have an array of a-z in 0-25. Then I will compare each position (maybe only the first and last because nothing will be in the middle that is not acceptable (example= you're. no reason to delete the ')).

I will post back with any update
Was This Post Helpful? 0
  • +
  • -

#4 jimblumberg  Icon User is offline

  • member icon


Reputation: 4060
  • View blog
  • Posts: 12,534
  • Joined: 25-December 09

Re: Exclude Everything But Letters In Fstream Input (Using >>)

Posted 04 September 2012 - 10:57 AM

Since you are using std::string you can use std::string.find_first_of() in a loop and replace or erase the characters you find from your string.


Also you could simplify your while loop like:
	while (dattFile >> word)
	{
		frequency=1;
		myVector.push_back(WordFrequency(1,word));
	}



Jim

This post has been edited by jimblumberg: 04 September 2012 - 10:59 AM

Was This Post Helpful? 0
  • +
  • -

#5 WickedBetrayal  Icon User is offline

  • D.I.C Head

Reputation: 0
  • View blog
  • Posts: 84
  • Joined: 16-February 11

Re: Exclude Everything But Letters In Fstream Input (Using >>)

Posted 04 September 2012 - 11:20 AM

I got this.

 char alphabet[26] = {'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z'};


	string word;
	int frequency;

	while (dataFile >> word)
	{
		for (unsigned int i=0;i<word.length(); i++)
		{
			for (unsigned int j=0;j<25;j++)
			{
				if (word[i] != alphabet[j])
				{
					word.erase(i);
				}
			}
		}
		frequency=1;
		myVector.push_back(WordFrequency(1,word));
		if (dataFile.fail())
			break;
	}



It should work, but I get an xstring: string subscript out of range error when I compile around the line:
if (word[i] != alphabet[j])

Was This Post Helpful? 0
  • +
  • -

#6 vividexstance  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 659
  • View blog
  • Posts: 2,260
  • Joined: 31-December 10

Re: Exclude Everything But Letters In Fstream Input (Using >>)

Posted 04 September 2012 - 11:31 AM

How about writing your own input function? Have the function return a bool that is true if it was able to input a word and false otherwise. The function could input just a character at a time and you just check if the character is alphabetical. This way you could use the code that you had before, you just change the input stream operation with a call to the function you wrote.

*EDIT*: I'm just curious as to where you increment the word frequency if you read in a word that's already been read in?

This post has been edited by vividexstance: 04 September 2012 - 11:32 AM

Was This Post Helpful? 0
  • +
  • -

#7 WickedBetrayal  Icon User is offline

  • D.I.C Head

Reputation: 0
  • View blog
  • Posts: 84
  • Joined: 16-February 11

Re: Exclude Everything But Letters In Fstream Input (Using >>)

Posted 04 September 2012 - 11:38 AM

View Postvividexstance, on 04 September 2012 - 01:31 PM, said:

How about writing your own input function? Have the function return a bool that is true if it was able to input a word and false otherwise. The function could input just a character at a time and you just check if the character is alphabetical. This way you could use the code that you had before, you just change the input stream operation with a call to the function you wrote.

*EDIT*: I'm just curious as to where you increment the word frequency if you read in a word that's already been read in?


Sorry. Here is the rest of inputting if the word already has been stored:

for (unsigned int i=0;i<myVector.size();i++)
	{
		for (unsigned int j=i+1;j<myVector.size();)/>
		{
			if (myVector[i].word == myVector[j].word)
			{
				myVector.at(i).frequency++;
				myVector.erase(myVector.begin()+j);
			}
			else
				j++;
		}
	}


Was This Post Helpful? 0
  • +
  • -

#8 jimblumberg  Icon User is offline

  • member icon


Reputation: 4060
  • View blog
  • Posts: 12,534
  • Joined: 25-December 09

Re: Exclude Everything But Letters In Fstream Input (Using >>)

Posted 04 September 2012 - 12:19 PM

Quote

It should work, but I get an xstring: string subscript out of range error when I compile around the line:

You need to look closer at the std::string.erase() function. If you just pass 1 parameter to this function it erases from that position to the end of the string. Is that what you intend? Also do you intend to erase upper case letters as well?

You really should take a look at some of the std::string member functions, like std::string.find_first_not_of(), std::string.find_first_of(). This will probably simplify your searches.

Jim

This post has been edited by jimblumberg: 04 September 2012 - 12:19 PM

Was This Post Helpful? 0
  • +
  • -

#9 WickedBetrayal  Icon User is offline

  • D.I.C Head

Reputation: 0
  • View blog
  • Posts: 84
  • Joined: 16-February 11

Re: Exclude Everything But Letters In Fstream Input (Using >>)

Posted 04 September 2012 - 12:48 PM

I fixed it by using erase(position,number of elements to erase).

But I guess when it is done erasing, it messes up the size and I keep getting the out of range string error. Ah.
Was This Post Helpful? 0
  • +
  • -

#10 #define  Icon User is offline

  • Duke of Err
  • member icon

Reputation: 1342
  • View blog
  • Posts: 4,601
  • Joined: 19-February 09

Re: Exclude Everything But Letters In Fstream Input (Using >>)

Posted 04 September 2012 - 07:01 PM

Hi, some bits.

There is a function that tests whether a character is alphabetic isalpha() - in the cctype (ctype.h) header.

The string::erase and vector::erase functions return a valid iterator when used with iterators.

Possibly, a different way - before entering the word in the vector, you could test whether it already exists.
Was This Post Helpful? 1
  • +
  • -

#11 vividexstance  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 659
  • View blog
  • Posts: 2,260
  • Joined: 31-December 10

Re: Exclude Everything But Letters In Fstream Input (Using >>)

Posted 05 September 2012 - 09:18 AM

View Post#define, on 04 September 2012 - 10:01 PM, said:

Possibly, a different way - before entering the word in the vector, you could test whether it already exists.

I was thinking the same thing, it seemed strange to be doing extra work than is really necessary.
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1