4 Replies - 1584 Views - Last Post: 03 November 2009 - 08:57 AM Rate Topic: -----

#1 Koreos   User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 26
  • Joined: 04-February 09

Java Collections

Posted 02 November 2009 - 12:42 PM

In my current college project, I have to read a .txt file and count the number of occurrences of each unique word. It is required to use a collection.

From what you can see in my code, I got the file read using a pattern as a delimiter to omit punctuation and white spaces. The problem that I have is that I don't know how to use collections.. well actually I know how to use it's methods, but I can't completely grasp the concept.

What kind of collection would allow me to keep a list of unique words and a number of occurrences for each one of them? If I could use a two-dimensional array this assignment would be easy, but alas the teacher didn't go into too much detail with collections...

Thank you.

import java.io.*;
import java.util.Scanner;

public class WordOccurrence {	
	Scanner sc;
	public static void main(String[] args) {
		new WordOccurrence();
	}
	WordOccurrence() {
		/** File setup */
		try {
			sc = new Scanner(new File("/Volumes/Data/Code/Eclipse workspace/project4/src/Obama_Education_Speech.txt"));
			readFile(sc);
		} catch (FileNotFoundException e){
			System.out.println("Error: File not found.");
		}
	}
	private void readFile(Scanner sc) {
		sc.useDelimiter("\\s|[,.'?\"]");
		while (sc.hasNext()) {
			System.out.println(" << " + sc.next() + " >> ");
		}
		
	}
}



Is This A Good Question/Topic? 0
  • +

Replies To: Java Collections

#2 Momerath   User is offline

  • D.I.C Lover
  • member icon

Reputation: 1021
  • View blog
  • Posts: 2,463
  • Joined: 04-October 09

Re: Java Collections

Posted 02 November 2009 - 12:44 PM

Hashtable would work, with the words as the keys, and the occurances as the values.
Was This Post Helpful? 0
  • +
  • -

#3 macosxnerd101   User is offline

  • Games, Graphs, and Auctions
  • member icon




Reputation: 12800
  • View blog
  • Posts: 45,992
  • Joined: 27-December 08

Re: Java Collections

Posted 02 November 2009 - 01:09 PM

HashMap would be more efficient than HashTable because you don't have to keep increasing capacity. Otherwise, you work with HashMap in a very similar way. To get the number of occurrences for each word, get the Key Set from the HashMap, then iterate through the Set getting the values.

For more information on HashMap, check out the following link:
http://www.j2ee.me/j...il/HashMap.html

If you need any help implementing it, feel free to post. Also, I've also written a snippet called Finding the Mode With A Map if you want to check it out. Good luck! :)
Was This Post Helpful? 1
  • +
  • -

#4 Koreos   User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 26
  • Joined: 04-February 09

Re: Java Collections

Posted 02 November 2009 - 02:20 PM

Thank you, I got it working.
import java.io.*;
import java.util.*;

public class WordOccurrence {	
	Scanner sc;
	HashMap<String, Integer> wordList;
	
	public static void main(String[] args) {
		new WordOccurrence();
	}
	WordOccurrence() {
		/** File setup */
		try {
			sc = new Scanner(new File("/Volumes/Data/Code/Eclipse workspace/project4/src/Obama_Education_Speech.txt"));
			readFile();
			printResults();
		} catch (FileNotFoundException e){
			System.out.println("Error: File not found.");
		}
	}
	private void readFile() {
		/** read token by token */
		sc.useDelimiter("\\s|[,./<>?;:\"[]\\{}|!@#$%^&*()_+-=]]");
		wordList = new HashMap<String, Integer>();
		while (sc.hasNext()) {
			String word = sc.next().toLowerCase(); // toLowerCase so we don't get duplicates
			if (!(wordList.containsKey(word))) // do we have this word in the hashmap yet?
				wordList.put(word, 1); // add to hashmap
			else
				wordList.put(word, wordList.get(word) + 1); // increase counter
		}
	}
	private void printResults() {
		Iterator it = wordList.keySet().iterator();
		int totalCount = 0;
		int uniqueCount = 0;
		while (it.hasNext()) {	// iterate through the hashmap's set
			String word = (String) it.next();
			System.out.println("Word: " + word + "\t\tCount: " + wordList.get(word));
			totalCount += wordList.get(word);
			uniqueCount++;
		}
		System.out.println("\nTotal Word Count: " + totalCount);
		System.out.println("Total UNIQUE Word Count: " + uniqueCount);
	}
}


Was This Post Helpful? 0
  • +
  • -

#5 Koreos   User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 26
  • Joined: 04-February 09

Re: Java Collections

Posted 03 November 2009 - 08:57 AM

I am in need of help, again...

Everything is working as it should, but the pattern matching. If you take a look at the top of the output you'll see about 175 white spaces.

The goal here is to count every word that doesn't start with a symbol, while discarding all punctuation, symbols and numbers. Special attention has to be paid to the ' (apostrophe), as it's used in some words. I've tried using the pattern [^A-Za-z]|'|[,\\s]+, but I the output didn't come out as it's supposed to (3758 words...).

I am not confident at all about the word count. If you paste the original file in a word processor it'll say there's about 2437 total words... I am getting 2627. :crazy:

Thanks again!

import java.io.*;
import java.text.NumberFormat;
import java.util.*;

public class WordOccurrence {	
	Scanner sc;
	TreeMap<String, Integer> wordList;
	int noOfCharacters = 0;
	
	public static void main(String[] args) {
		new WordOccurrence();
	}
	WordOccurrence() {
		/** File setup */
		try {
			sc = new Scanner(new File("/Volumes/Data/Code/Eclipse workspace/project4/src/Obama_Education_Speech.txt"));
			readFile();
			printResults();
		} catch (FileNotFoundException e){
			System.out.println("Error: File not found.");
		}
	}
	private void readFile() {
		/** read token by token */
		sc.useDelimiter("[,\\s]+|[,.;:()?!\"-]|[0-9]");
		wordList = new TreeMap<String, Integer>();
		while (sc.hasNext()) {
			String word = sc.next().toLowerCase(); // toLowerCase so we don't get duplicates
			noOfCharacters += word.length();
			if (!(wordList.containsKey(word))) // do we have this word in the hashmap yet?
				wordList.put(word, 1); // add to hashmap
			else
				wordList.put(word, wordList.get(word) + 1); // increase counter
		}
	}
	private void printResults() {
		Iterator it = wordList.keySet().iterator();
		int totalCount = 0;
		int uniqueCount = 0;
		while (it.hasNext()) {	// iterate through the hashmap's set
			String word = (String) it.next();
			System.out.println(wordList.get(word) + "\t" + word);
			totalCount += wordList.get(word);
			uniqueCount++;
		}
		
		NumberFormat nf = NumberFormat.getInstance();
		System.out.println("\nTotal Word Count: \t\t" + totalCount);
		System.out.println("Total UNIQUE Word Count: \t" + uniqueCount);
		System.out.println("Average word length: \t\t" + nf.format(noOfCharacters/new Double(uniqueCount)));
	}
}


175	
62	a
1	able
11	about
2	across
1	activity
1	adjusting
1	admit
1	adult
2	adults
1	advantages
1	affected
2	afraid
1	again
3	ago
1	aids
14	all
1	along
1	altos
3	always
5	america
1	american
7	an
86	and
2	andoni
1	answer
4	any
1	anyone
1	anything
1	architect
13	are
4	aren‚äôt
1	arlington
2	around
1	articles
5	as
4	ask
1	asking
1	asleep
1	assignment
14	at
1	athlete
2	attention
1	attitude
2	back
2	bad
2	basketball
12	be
6	because
1	become
1	bed
1	been
4	before
1	behave
1	behind
6	being
1	believe
4	best
1	better
1	bills
2	bless
3	book
1	books
1	boost
1	born
1	bouncing
1	brain
1	brown
1	build
1	bullied
1	buster
15	but
1	by
1	california
1	calling
12	can
2	cancer
2	can‚äôt
1	care
1	career
1	careers
1	center
3	challenges
2	chances
1	changed
1	chicago
1	chose
1	circumstances
1	civil
4	class
1	classes
1	classrooms
1	click
1	coach
5	college
2	come
1	comes
1	commit
1	communicate
1	community
1	companies
1	complain
1	completely
1	computers
1	contribution
6	could
1	could‚äôve
1	counselor
7	country
1	courage
1	court
1	create
1	creativity
1	crime
1	critical
1	cure
1	cut
1	cutting
7	day
1	debate
4	decide
1	decided
1	dedicated
1	define
1	depression
2	deserve
2	destiny
2	determine
3	develop
2	did
4	didn‚äôt
1	different
1	differently
1	difficult
1	discouraged
1	discover
1	discoveries
1	discrimination
1	discuss
1	diseases
20	do
1	doctor
3	doesn‚äôt
4	doing
1	done
12	don‚äôt
2	down
1	dr
1	drafts
1	dreams
2	drop
1	dropping
1	during
5	each
1	early
1	earned
1	easily
1	easy
1	economy
10	education
1	effort
2	either
3	end
1	endured
1	energy
2	english
4	enough
2	environment
1	equipment
5	even
1	ever
11	every
1	everybody
1	everyone
4	everything
3	excuse
4	expect
1	expected
2	extra
1	extracurricular
1	facebook
1	faced
1	failed
2	failures
1	fair
3	fall
1	families
4	family
2	father
3	feel
2	feeling
1	fell
1	felt
4	few
1	fifty
1	fight
1	finally
1	find
8	first
1	fit
1	fix
1	flu
2	focus
1	focused
1	follow
28	for
1	fortunate
2	foster
2	fought
1	found
1	founded
1	free
1	friday
1	friends
12	from
1	front
1	fulfill
4	future
1	gain
1	games
1	gangs
10	get
3	getting
7	give
2	given
1	glad
5	go
1	goal
3	goals
2	god
6	going
2	gone
11	good
1	google
9	got
1	government
1	government‚äôs
3	grade
1	grades
2	graduate
1	grandparent
1	grandparents
1	great
1	greatest
1	guarantee
5	had
1	hand
1	hands
1	happy
9	hard
2	harder
1	hardly
1	harry
5	has
22	have
2	having
4	he
1	headed
2	health
1	hello
4	help
4	her
5	here
1	herself
2	he‚äôs
5	high
1	him
4	his
1	history
1	hit
4	home
1	homelessness
2	hometown
4	homework
1	honors
1	hope
1	hour
1	hours
2	how
1	how‚äôs
1	hundred
2	hundreds
41	i
5	if
1	illinois
1	imagine
2	important
40	in
1	indonesia
1	ingenuity
1	innovator
1	insights
1	inspiring
1	intellect
2	into
1	inventor
1	involved
1	iphone
8	is
3	isn‚äôt
22	it
8	it‚äôs
2	i‚äôd
8	i‚äôm
5	i‚äôve
4	jazmin
1	jk
3	job
1	jobs
2	join
1	jordan
8	just
1	justice
2	keep
1	kept
4	kids
2	kindergarten
1	kitchen
13	know
1	knowledge
1	lady
1	law
1	lawyer
7	learn
1	learning
1	left
2	less
1	lessons
5	let
8	life
9	like
1	lines
1	listen
2	little
1	live
1	lived
2	lives
1	local
1	lonely
2	longer
2	look
1	looks
1	los
2	lost
7	lot
1	love
1	loved
7	make
1	making
1	man
1	managed
1	many
2	math
3	matter
11	maybe
1	mayor
4	me
2	mean
2	means
1	medicine
3	meet
1	member
1	memory
1	michael
1	michelle
1	middle
5	might
1	military
1	minute
2	missed
1	monday
3	money
1	moon
6	more
2	morning
5	most
3	mother
4	much
11	my
3	nation
1	necessarily
12	need
1	neglecting
1	neighborhood
1	neighborhoods
2	neither
1	nervous
1	never
8	new
1	newspaper
2	next
8	no
1	none
12	not
1	note
1	nothing
7	now
1	nurse
1	obama
53	of
1	offer
1	officer
1	ok
1	old
18	on
1	once
8	one
1	ones
2	one‚äôs
1	opportunities
2	opportunity
32	or
4	other
7	our
4	out
3	over
1	overcame
5	own
2	paper
1	parent
4	parents
1	parents‚äô
1	part
2	pay
1	paying
9	people
2	perez
1	picnic
1	play
1	police
1	potter
1	poverty
1	practice
1	president
1	pressuring
1	pretty
1	principals
1	probably
2	problem
2	problems
1	program
1	project
1	protect
2	proud
1	provide
1	public
1	published
1	pushing
3	put
2	questions
2	quit
2	quitting
1	raised
1	rapping
1	read
1	reading
1	ready
1	reality
1	really
1	refused
1	rejected
1	relevant
1	resolve
1	responsibilities
8	responsibility
1	revolution
1	rich
7	right
1	rights
1	roma
1	rowling‚äôs
2	safe
1	said
2	same
3	sat
2	say
1	scholarship
18	school
4	schools
3	schoolwork
1	schultz
2	science
1	second
1	seem
1	senator
1	send
1	seniors
1	sense
1	serious
2	set
1	setting
2	shantell
6	she
1	she‚äôs
1	shots
2	should
2	show
1	shows
2	sign
1	similar
1	simple
1	since
1	sing
5	single
3	sit
3	skills
13	so
1	social
2	solve
1	solving
6	some
1	someone
7	something
1	sometimes
1	song
1	sorts
1	speak
1	speeches
2	spend
1	spending
1	sport
1	stand
1	standards
1	star
2	start
1	started
1	starting
3	stay
1	stayed
1	steve
2	still
3	story
1	strength
1	struggled
1	struggling
1	student
6	students
1	studies
2	study
2	studying
1	stupid
1	subject
3	succeed
1	succeeded
1	success
3	successful
1	summer
1	support
1	supporting
1	supportive
1	supreme
2	sure
1	surgeries
1	table
2	take
1	taken
1	takes
1	talents
1	talk
4	talked
1	talking
2	teach
4	teacher
4	teachers
1	teachers‚äô
2	team
1	teased
1	technologies
1	texas
3	than
1	thank
22	that
9	that‚äôs
53	the
5	their
4	them
1	themselves
1	then
5	there
2	there‚äôs
2	these
9	they
8	things
2	thinking
12	this
8	those
1	thousands
1	three
4	through
1	ticket
6	time
7	times
98	to
8	today
3	too
1	took
1	tough
1	toughest
3	track
1	train
1	treatments
1	tried
2	trouble
1	troublemaker
1	trust
1	truth
2	try
1	trying
1	tuning
1	turn
1	turning
3	tv
1	twelfth
1	twelve
1	twenty
1	twitter
1	two
2	understand
1	understandable
1	university
2	unless
3	until
11	up
4	us
1	vaccine
1	varsity
1	virginia
1	volunteer
1	wage
1	wakefield
1	waking
9	want
1	war
9	was
1	wash
3	wasn‚äôt
2	way
6	we
1	weakness
2	well
3	went
3	were
1	we‚äôve
15	what
1	whatever
2	what‚äôs
14	when
1	whenever
9	where
1	whether
1	which
17	who
1	who‚äôs
1	who‚äôve
2	why
1	wife
7	will
1	winter
1	wishing
12	with
1	without
1	won
3	won‚äôt
5	work
3	worked
2	working
3	world
1	worse
1	would
3	write
1	writer
1	written
1	xbox
3	year
6	years
119	you
4	young
48	your
7	yourself
9	you‚äôll
13	you‚äôre
4	you‚äôve
20	‚äì

Total Word Count: 		2627
Total UNIQUE Word Count: 	683
Average word length: 		15.603


Was This Post Helpful? 0
  • +
  • -

Page 1 of 1