1 Replies - 708 Views - Last Post: 22 April 2012 - 12:09 PM Rate Topic: -----

#1 Hooor  Icon User is offline

  • D.I.C Head

Reputation: 0
  • View blog
  • Posts: 86
  • Joined: 26-May 11

counting words and its count for each text files inside folder

Posted 22 April 2012 - 11:30 AM

hi
I am trying in my program to count words for all text files
in specific directory with the name of the file that it appears in
# the text files contain HTML tags and I suppose to start counting after body tag
for doing that I used Map but when I run it I got null pointer exception
could you tell me where is the problem
and is my why in reading the files correct ???
thanks



import java.io.*;
import java.util.*;

public class DataminingPro {
	public static void main(String[] args){
    Map<String,Integer> sol= new HashMap<String,Integer>();
    Scanner scan=new Scanner(System.in);
	String f1,temp,line;
	String t[];
	
	try{
		FileReader fr;		
	BufferedReader br ;
	
	System.out.println("Please Enter the directory name : ");
    String s=scan.next();
     File f=new File(s);
     String files[] = f.list();
     //..........................................
     for (int i = 0; i < files.length; i++) {
		    f1 = files[i];
		   
		//................................................
		   fr = new FileReader (s+"\\" + f1);
		    br = new BufferedReader(fr);
		
while(((line = br.readLine()) != null && line.indexOf("body")< 0 && line.indexOf("Body")< 0 && line.indexOf("BODY")< 0  ))
		{}
		if(line != null){
			do{
				t=line.replaceAll("<>[.,?!:;/]", "").split(" ");
				
			                                  for(int m=0;m<t.length;m++){
				                                         t[m]=t[m].toLowerCase();
			                                                             }//make words smaller
			                                                             
			       
			       for(int n=0;n<t.length;n++){
			       	temp= t[n];
			        if(!(sol.containsKey(f1+" "+temp))){
			        	sol.put(f1+" "+temp,1);
			        }else{
			          sol.put(f1+" "+temp, sol.get(f1)+1);	
			        }//contain key
			       }// counting number of for                                                        
			                                                              
			   
			 
			
		      }while((line = br.readLine())!= null);
			
		}//if ine not null
	    	br.close();
		}//big for of files name
	//...................................................................	
	
	
	}catch(Exception ex)
	{
		System.out.println("There is Exception " + ex);
	}//ex   
 
 }//main   
}//class




Is This A Good Question/Topic? 0
  • +

Replies To: counting words and its count for each text files inside folder

#2 g00se  Icon User is offline

  • D.I.C Lover
  • member icon

Reputation: 2656
  • View blog
  • Posts: 11,197
  • Joined: 20-September 08

Re: counting words and its count for each text files inside folder

Posted 22 April 2012 - 12:09 PM

You can't really parse html like that by looking for substrings. For one thing, e.g. the <title> tag could easily contain the word 'body'.

Use a proper html parser. If it's good html, you can try the one in the jdk
http://docs.oracle.c...LEditorKit.html else use a parser such as Neko
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1