4 Replies - 4373 Views - Last Post: 29 September 2012 - 08:40 PM Rate Topic: -----

#1 aquafatz  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 19
  • Joined: 23-September 12

Extract words from text file and store in array-java-Improve

Posted 28 September 2012 - 12:19 PM

Below I have wriiten a program to
-->read around 2000 files stored in a folder
-->for each file extract the words using split()
-->store these words in an array
--->print them

Could you'll please suugest a way I can improve the program. Because now I want to do the following tasks:
-->For each word remove punctuation such as .,()'" etc
-->If the word is an html tag <Title> , don't store it.

Thank you!!

package FirstTry;

import java.io.*;
import java.util.*;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;

public class FirstTry2
{
	public static void main(String[] args)
	{
		
		try
		{
			File dir = new File(
					"Path of folder containing around 2000 text files");
			for (File fn : dir.listFiles())
			{
				FileInputStream fstream = new FileInputStream(fn);
				DataInputStream in = new DataInputStream(fstream);
				BufferedReader br = new BufferedReader(new InputStreamReader(in));
				String strLine;
				while ((strLine = br.readLine()) != null)
				{
					String[] words = strLine.split("\\s+");
					for (String s: words)
			    {
			      System.out.println(s);
			    }
       

				}
				br.close();
				in.close();
			}

		}
		catch (FileNotFoundException e)
		{
			e.printStackTrace();
		}
		catch (IOException e)
		{
			e.printStackTrace();
		}
		
	}
}




Is This A Good Question/Topic? 0
  • +

Replies To: Extract words from text file and store in array-java-Improve

#2 blackcompe  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 1156
  • View blog
  • Posts: 2,538
  • Joined: 05-May 05

Re: Extract words from text file and store in array-java-Improve

Posted 28 September 2012 - 02:41 PM

You can clean up the string with:

str.replaceAll("[^a-zA-Z0-9]", "");


That restricts the string to being alphanumeric. Modify the regex as needed. You can check for a tag with:

str.matches("<.*>");


That regex is very primitive and won't detect even a portion of the complex tags that can be constructed. Look here for more info. Alternatively, you can use an HTML parser to validate the tags.
Was This Post Helpful? 1
  • +
  • -

#3 aquafatz  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 19
  • Joined: 23-September 12

Re: Extract words from text file and store in array-java-Improve

Posted 29 September 2012 - 01:10 PM

I tried out what you wrote. It works. But there seems to be another issue.

At line s=null ,if I encounter an html tag, I want to make that array value null. But that is not happening. It still prints the word.
while ((strLine = br.readLine()) != null)
				{
					String[] words = strLine.split("\\s+");
					for (String s : words)
					{
						String regex = "[_\\W]";
						//String start = "<";
						//String end=">";
						if (s.matches("<(.*)>"))
						{
							s = null;
							System.out.println("HTML tag");
						}
						String result = s.replaceAll(regex, "");

						System.out.println(result);
					}


Was This Post Helpful? 0
  • +
  • -

#4 Kakerergodt  Icon User is offline

  • D.I.C Head

Reputation: 87
  • View blog
  • Posts: 201
  • Joined: 01-May 12

Re: Extract words from text file and store in array-java-Improve

Posted 29 September 2012 - 02:23 PM

That is because when you use the shortened for-loop the variable "s" is not really the "slot" in the array, only a reference to the object stored in that "slot", so assigning a new value to "s" will not overwrite the value in the array slot. So use the traditional for-loop, with an index incrementer insead, so you update array[index].
Was This Post Helpful? 2
  • +
  • -

#5 aquafatz  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 19
  • Joined: 23-September 12

Re: Extract words from text file and store in array-java-Improve

Posted 29 September 2012 - 08:40 PM

Oh ok...thanks!! :)
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1