Counting and Identifying Individual Words From an Input Source

  • (2 Pages)
  • +
  • 1
  • 2

26 Replies - 2965 Views - Last Post: 08 February 2010 - 11:20 AM Rate Topic: -----

#1 Guest_Hellreaver*


Reputation:

Counting and Identifying Individual Words From an Input Source

Posted 06 February 2010 - 11:49 PM

As the title suggest, I'm looking to count the number of words contained within a certain external input (i.e. input.txt). Its (unfortunately) been quite a while since I've worked with Java, and I was hoping that someone would give me some help in developing the code for this project. I vaguely remember how to access inputs using a buffered reader, as well as how to read the first line of the input stream.

My first question is this (and I'm sorry its such a noob question, its just been a really long time since I worked with Java, and an even longer time since working with I/O): How do I set up my code to access the external input in its totality? As in, not just line by line, or character by character (as in .read and .readline)?

My second question is how do I get the input data to be separated into individual strings? I believe that this can be done somehow using StringTokenizer, but I've also heard that this is obsolete, and having never used it before and being exceedingly rusty, I am hesitant give this a go. I have also read up on the String.split method. From what I've read, this would be exceptionally handy. As an example:

     String[] result = "this is a test".split("\\s");
     for (int x=0; x<result.length; x++)
         System.out.println(result[x]);




This would result in a similar output to what I would like:

Quote

this
is
a
test


However, my true goal for this is to place the individual words that I find into a vector (so as to allow the words that are found as the input is read to be added into the vector (this was a suggested solution for me, but as to actually implementing this, I am unsure). The individual strings that are identical need to be printed as a single string (i.e. if there is three "the"s within the input file, the desired output prints a single "the") along with the number of times that specific string occurred within the input itself. This means that I need the program to separate the individual strings within the input file into words, identify and group words that are identical, count the total number of occurrences of said strings, and finally print the strings along with the total times the strings appear within the input file.

Not only was the possibility of using a vector suggested, but using a map was also suggested. From what little I know of maps, this would make sense, as a map consists of a key (a string, as it were) and a value (the number of times said string occurs within the input). However, I am extremely unfamiliar with the usage of maps (I know that map must be declared as an interface) and am unsure how to begin to implement this. All of the websites I've looked at show an exceedingly general case of using maps, but I'm unsure which methods would have a need to be implemented by a class.

For instance, when declaring the map interface, would it look something like this:

public interface Map<String,int> {

...

}




Next, I would like to put in the methods I would need to use for the map, but this is the part where I'm really confused by maps themselves. I'm sure I will need methods such as int put, which will set the value for the key and the matching int and Set<String>. My biggest concern on this comes from a portion of code I've seen:

Valuetype put (Keytype key, Valuetype value);



In this case, the value type would be int, and the keytype would be String. However, I'm not sure about the values for "key" and "value", as I would need many values within those specific variables.

Terribly sorry about my rant here, I'm just extremely confused as to how to go about approaching this problem, and this has been made worse since I have a very short deadline and have not had many hours of sleep.


Thanks in advance for any help, and if there are any questions (because I was probably unclear in my post, and for this I apologize), please don't hesitate to ask. Any help is appreciated.

Thanks,

Hellreaver

Is This A Good Question/Topic? 0

Replies To: Counting and Identifying Individual Words From an Input Source

#2 xor-logic  Icon User is offline

  • HAL9000 was an Apple product
  • member icon

Reputation: 128
  • View blog
  • Posts: 764
  • Joined: 04-February 10

Re: Counting and Identifying Individual Words From an Input Source

Posted 07 February 2010 - 12:16 AM

For your first question (getting the input all in one) the way I've been doing it is as follows:
//tempFile is a string
//in is the reader
try {
	tempFile = "";
	while (in.available() != 0) {
		tempFile += in.readLine();
	}
} catch (Exception e) {
	JOptionPane.showMessageDialog(mainGUI.pane,"There was an error reading the file.");
	return;
}



As for breaking it into individual words, that's incredibly easy. Use string.split(); and assign the result to a string array. Then you can use a quick for loop to transfer the values of the array into a vector. I don't know if you can assign the result directly to a vector.
Ex:
String[] words = tempFile.split(" ");
Vector words2 = new Vector(5,5);
for (int i = 0; i < words.length; i++) {
words2.add(words[i]);
}



However, if your goal is just to output the individual words in an input string, a vector is unnecessary.
Was This Post Helpful? 1
  • +
  • -

#3 macosxnerd101  Icon User is online

  • Self-Trained Economist
  • member icon




Reputation: 10389
  • View blog
  • Posts: 38,446
  • Joined: 27-December 08

Re: Counting and Identifying Individual Words From an Input Source

Posted 07 February 2010 - 08:27 AM

Actually, when using Generics, you can't use primitives. So if you want a Map<String, int>, you have to declare it as a Map<String, Integer>. Also, the HashMap<K,V> class implements the Map<K,V> interface, so you don't have to worry about creating your own Map<K,V> class.

I've written a snippet that demonstrates the use of Maps. It is called "Find the Mode Using a Map." You might want to check it out. Good luck!

Link: http://www.dreaminco...snippet4518.htm
Was This Post Helpful? 1
  • +
  • -

#4 Hellreaver  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 15
  • Joined: 07-February 10

Re: Counting and Identifying Individual Words From an Input Source

Posted 07 February 2010 - 03:49 PM

Thank you both for your replies!

A question with regards to reading the entire input- does that bit of code require me to import anything (i.e. import java.io.*)?

Again, thank you ever so much for both of your replies!
Was This Post Helpful? 0
  • +
  • -

#5 xor-logic  Icon User is offline

  • HAL9000 was an Apple product
  • member icon

Reputation: 128
  • View blog
  • Posts: 764
  • Joined: 04-February 10

Re: Counting and Identifying Individual Words From an Input Source

Posted 07 February 2010 - 04:28 PM

View PostHellreaver, on 07 February 2010 - 02:49 PM, said:

Thank you both for your replies!

A question with regards to reading the entire input- does that bit of code require me to import anything (i.e. import java.io.*)?

Again, thank you ever so much for both of your replies!

It requires you to import java.io.* and also requires that in be a reader of whatever file you're getting.
Was This Post Helpful? 1
  • +
  • -

#6 macosxnerd101  Icon User is online

  • Self-Trained Economist
  • member icon




Reputation: 10389
  • View blog
  • Posts: 38,446
  • Joined: 27-December 08

Re: Counting and Identifying Individual Words From an Input Source

Posted 07 February 2010 - 05:08 PM

The Collections framework (Maps, Sets, Lists, etc.) is part of the java.util package. If you're using Scanner to get the input, it is also part of the java.util package. However, the other console and File input devices are a part of the java.io package.
Was This Post Helpful? 1
  • +
  • -

#7 Hellreaver  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 15
  • Joined: 07-February 10

Re: Counting and Identifying Individual Words From an Input Source

Posted 07 February 2010 - 05:23 PM

View Postxor-logic, on 07 February 2010 - 03:28 PM, said:

View PostHellreaver, on 07 February 2010 - 02:49 PM, said:

Thank you both for your replies!

A question with regards to reading the entire input- does that bit of code require me to import anything (i.e. import java.io.*)?

Again, thank you ever so much for both of your replies!

It requires you to import java.io.* and also requires that in be a reader of whatever file you're getting.


Ok, whenever I try this code, I get errors- the big three are that a) the method available is undefined by "BufferedReader" and b) neither jOptionPane nor mainGui can be resolved. Any ideas?
Was This Post Helpful? 0
  • +
  • -

#8 xor-logic  Icon User is offline

  • HAL9000 was an Apple product
  • member icon

Reputation: 128
  • View blog
  • Posts: 764
  • Joined: 04-February 10

Re: Counting and Identifying Individual Words From an Input Source

Posted 07 February 2010 - 05:29 PM

View PostHellreaver, on 07 February 2010 - 04:23 PM, said:

View Postxor-logic, on 07 February 2010 - 03:28 PM, said:

View PostHellreaver, on 07 February 2010 - 02:49 PM, said:

Thank you both for your replies!

A question with regards to reading the entire input- does that bit of code require me to import anything (i.e. import java.io.*)?

Again, thank you ever so much for both of your replies!

It requires you to import java.io.* and also requires that in be a reader of whatever file you're getting.


Ok, whenever I try this code, I get errors- the big three are that a) the method available is undefined by "BufferedReader" and b) neither jOptionPane nor mainGui can be resolved. Any ideas?


Ok, in reverse order:
-mainGUI was a GUI object I created.
-JOptionPane again deals with GUIs. Just delete that whole line and replace it with a System.out.println() statement.
-In the statement in.readLine();, in is a DataInputStream.
DataInputStream in = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));


Was This Post Helpful? 1
  • +
  • -

#9 Hellreaver  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 15
  • Joined: 07-February 10

Re: Counting and Identifying Individual Words From an Input Source

Posted 07 February 2010 - 05:38 PM

View Postxor-logic, on 07 February 2010 - 04:29 PM, said:

View PostHellreaver, on 07 February 2010 - 04:23 PM, said:

View Postxor-logic, on 07 February 2010 - 03:28 PM, said:

View PostHellreaver, on 07 February 2010 - 02:49 PM, said:

Thank you both for your replies!

A question with regards to reading the entire input- does that bit of code require me to import anything (i.e. import java.io.*)?

Again, thank you ever so much for both of your replies!

It requires you to import java.io.* and also requires that in be a reader of whatever file you're getting.


Ok, whenever I try this code, I get errors- the big three are that a) the method available is undefined by "BufferedReader" and b) neither jOptionPane nor mainGui can be resolved. Any ideas?


Ok, in reverse order:
-mainGUI was a GUI object I created.
-JOptionPane again deals with GUIs. Just delete that whole line and replace it with a System.out.println() statement.
-In the statement in.readLine();, in is a DataInputStream.
DataInputStream in = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));



Oh, okay. Awesome, thanks!

I was just scanning through some pages on the internet, and happened to find some info on the class Scanner. Do you happen to know much about it?
Was This Post Helpful? 0
  • +
  • -

#10 macosxnerd101  Icon User is online

  • Self-Trained Economist
  • member icon




Reputation: 10389
  • View blog
  • Posts: 38,446
  • Joined: 27-December 08

Re: Counting and Identifying Individual Words From an Input Source

Posted 07 February 2010 - 05:40 PM

Check out the API for Scanner.

Link: http://java.sun.com/...il/Scanner.html

Edit: For some reason, it double posted.

This post has been edited by macosxnerd101: 07 February 2010 - 05:41 PM

Was This Post Helpful? 0
  • +
  • -

#11 xor-logic  Icon User is offline

  • HAL9000 was an Apple product
  • member icon

Reputation: 128
  • View blog
  • Posts: 764
  • Joined: 04-February 10

Re: Counting and Identifying Individual Words From an Input Source

Posted 07 February 2010 - 05:42 PM

View PostHellreaver, on 07 February 2010 - 04:38 PM, said:

View Postxor-logic, on 07 February 2010 - 04:29 PM, said:

View PostHellreaver, on 07 February 2010 - 04:23 PM, said:

View Postxor-logic, on 07 February 2010 - 03:28 PM, said:

View PostHellreaver, on 07 February 2010 - 02:49 PM, said:

Thank you both for your replies!

A question with regards to reading the entire input- does that bit of code require me to import anything (i.e. import java.io.*)?

Again, thank you ever so much for both of your replies!

It requires you to import java.io.* and also requires that in be a reader of whatever file you're getting.


Ok, whenever I try this code, I get errors- the big three are that a) the method available is undefined by "BufferedReader" and b) neither jOptionPane nor mainGui can be resolved. Any ideas?


Ok, in reverse order:
-mainGUI was a GUI object I created.
-JOptionPane again deals with GUIs. Just delete that whole line and replace it with a System.out.println() statement.
-In the statement in.readLine();, in is a DataInputStream.
DataInputStream in = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));



Oh, okay. Awesome, thanks!

I was just scanning through some pages on the internet, and happened to find some info on the class Scanner. Do you happen to know much about it?


Yay, look at all these quotes. What fun.

I've used the Scanner class in the past when I was making text-based programs, but now that I've moved to graphic interface programs, I haven't really had much need for it. That may change, but for the moment, I've forgotten a lot of it.
Was This Post Helpful? 0
  • +
  • -

#12 macosxnerd101  Icon User is online

  • Self-Trained Economist
  • member icon




Reputation: 10389
  • View blog
  • Posts: 38,446
  • Joined: 27-December 08

Re: Counting and Identifying Individual Words From an Input Source

Posted 07 February 2010 - 05:46 PM

I think Scanner is a useful tool for two tasks. One is as a learning tool for novice programmers to practice getting user input, and the other is for an easy tool for reading in Files.

@Xor-logic: I'm guessing you're probably out of the novice stage, but you haven't gotten to File I/O yet. When you do get to File I/O, Scanner is so much easier to use than all the other File readers.
Was This Post Helpful? 0
  • +
  • -

#13 xor-logic  Icon User is offline

  • HAL9000 was an Apple product
  • member icon

Reputation: 128
  • View blog
  • Posts: 764
  • Joined: 04-February 10

Re: Counting and Identifying Individual Words From an Input Source

Posted 07 February 2010 - 05:49 PM

View Postmacosxnerd101, on 07 February 2010 - 04:46 PM, said:

I think Scanner is a useful tool for two tasks. One is as a learning tool for novice programmers to practice getting user input, and the other is for an easy tool for reading in Files.

@Xor-logic: I'm guessing you're probably out of the novice stage, but you haven't gotten to File I/O yet. When you do get to File I/O, Scanner is so much easier to use than all the other File readers.

@macosxnerd101: I actually have gotten to the File I/O stage. But so far, I've been using
DataInputStream in = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));


and haven't had any problems, so I haven't looked for anything else.
@Hellreaver: Don't forget there's that little button on the bottom right - "Was this post helpful?" (I flatter myself that I have been, or at least have really tried :) )
Was This Post Helpful? 0
  • +
  • -

#14 macosxnerd101  Icon User is online

  • Self-Trained Economist
  • member icon




Reputation: 10389
  • View blog
  • Posts: 38,446
  • Joined: 27-December 08

Re: Counting and Identifying Individual Words From an Input Source

Posted 07 February 2010 - 05:59 PM

That's a pretty good choice b/c it can read primitives, and it's a little more efficient than Scanner due to the buffering underneath. However, the fact that DataInputStream's readLine() method is deprecated is a big deal when you start read files where all the attributes of an Object are stored on one line.

The sticking points for me in terms of Scanner are definitely its ease of use and exceptional functionality (using custom delimiters, Scanning for patterns, testing type of next token, etc.).
Was This Post Helpful? 0
  • +
  • -

#15 xor-logic  Icon User is offline

  • HAL9000 was an Apple product
  • member icon

Reputation: 128
  • View blog
  • Posts: 764
  • Joined: 04-February 10

Re: Counting and Identifying Individual Words From an Input Source

Posted 07 February 2010 - 06:03 PM

View Postmacosxnerd101, on 07 February 2010 - 04:59 PM, said:

That's a pretty good choice b/c it can read primitives, and it's a little more efficient than Scanner due to the buffering underneath. However, the fact that DataInputStream's readLine() method is deprecated is a big deal when you start read files where all the attributes of an Object are stored on one line.

The sticking points for me in terms of Scanner are definitely its ease of use and exceptional functionality (using custom delimiters, Scanning for patterns, testing type of next token, etc.).

We're hijacking this thread! j/k. But thanks for reminding me about the deprecation.
@Hellreaver, you might want to suppress deprecation warnings.
	@SuppressWarnings("deprecation")
Basically, without that bit the compiler bitches about deprecation everytime you compile.
Was This Post Helpful? 0
  • +
  • -

  • (2 Pages)
  • +
  • 1
  • 2