2 Replies - 322 Views - Last Post: 20 January 2013 - 10:09 AM Rate Topic: -----

#1 bteeple  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 26
  • Joined: 01-May 11

Least Common Substring Algorithm

Posted 19 January 2013 - 12:00 PM

* This is a homework problem, I am not looking for you to do it. I just want some insight on where to go on from here or where my problem is with my algorithm not working. Thanks.

Basically this program is a spell checker and auto correct. I know it is a bad and horrible code. The teacher just wants us to practice with simple algorithms and learning how to do them. I know the program is not complete, as I have to figure out this algorithm before I can move on.

Problem:

I can't get the algorithm to correctly find the least common sub string. The algorithm is as shows:


0 : j = 0 or i = 0
L[i][j] { L[i-1][j-1] + 1 : x[i-1]=y[j-1]
Math.max(L[i][j-1],L[i-1][j]): x[i-1] != y[j-1]



Basically the terms on the right side of the ":" are what I use to test the statements using if statements as you can see below. And the ones on the right are the actual code to place values into a 2d array. If i or j = 0 then nothing is done and it is moved on, if they equal each other then a 1 gets placed into that slot of the 2d array. And if they do not equal each other it will take the max value of the position above and to the left of that index and copy the value into the new position. After all is said and done, the least common sub string will be in the very last index.

My problem is I can't seem to get it to work. I tried the debugger and every time I think it is one if statement, I test another one and it contradicts my last experiment. I am lost. Please help!


I have pasted all of the files below so that if you could test and figure it out on your own. The output I am looking for is:

Also, I have attached the dictionary file I will be using with this assignment as it is way to big to post.

abcde -> abcde length = 5
kitten -> sitting length = 4
gattacca -> acgtacgt length = 4
Best match for nessicary is 'null'.
Best match for nesiccary is 'null'.
Best match for necisary is 'null'.
Word at position 12345 is 'basidium'.
Word at position 123456 is 'proteoglycan'.
Word at position 49931 is 'encased'.



import java.io.BufferedWriter;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.io.PrintWriter;
import java.util.Scanner;

/**
 * This program is a basic spelling correction suggester. It will attempt to fix spelling errors in mistyped words.
 * 
 * @author Blake Teeple
 * 
 * CS1122 Spring 2013
 * Program 1
 *
 */
public class SpellingCorrector {
	private String[] dictionary;

	/**
	 *	This method will load the dictionary file and input the values from that file
	 *	into an array to be used throughout the program.
	 * 
	 * @param dictFilename
	 *            name of the dictionary file to load
	 */
	void loadDictionary(String dictFilename) throws IOException {

		Scanner in = new Scanner(new File("dictionary.txt")); //Input the dictionary file.

		int arraysize = 0;

		for(int i = 0; i < 1; i++){  //Find the size of the dictionary file.
			if(in.hasNextInt()){
				arraysize = in.nextInt();
			}
		}
		String[] dictionary2 = new String[arraysize];  //Create a new array to house the dictionary.

		for(int i = 2; i < dictionary2.length; i++){ //Put all of the strings from dictionary.txt into the dictionary2 array.
			if(in.hasNext()){
				dictionary2[i] = in.next();
			}
		}
		dictionary = dictionary2.clone(); //Clone dictionary2 into the previous created dictionary. And close dictionary.txt.
		in.close();
	}

	/**
	 * 	This method will return the dictionary array.
	 * 
	 * 
	 * @return Contents of the dictionary file as an array of Strings
	 */
	String[] getDictionary() {

		return dictionary; //Returns the dictionary array.
	}

	/**
	 * 	This method will find the longest common substring in two given words. It will then return the integer
	 * 	value of how long the substring is.
	 * 
	 * @param x
	 *            String to be compared to y
	 * @param y
	 *            String to be compared to x
	 * @return The edit distance for strings x and y
	 */
	int getLcsLength(String x, String y) {

		int[][] lcs = new int[x.length() + 1][y.length() + 1];
		char[] xchar = new char[x.length()];
		char[] ychar = new char[y.length()];

		for(int i = 0; i < xchar.length; i++){
			xchar[i] = x.charAt(i);
		}
		for(int i = 0; i < ychar.length; i++){
			ychar[i] = y.charAt(i);
		}

		for(int i = 0; i < x.length() + 1; i++){
			for(int j = 0; j < y.length()  + 1; j++){
				if(j == 0 || i == 0){
					lcs[i][j] = 0;
				}
				else if(xchar[i - 1] == ychar[j - 1]){
						lcs[i][j] += 1;
				}
				else if(xchar[i - 1] != ychar[j - 1]){
					lcs[i][j] = Math.max(lcs[i][j-1],lcs[i-1][j]);
				}

			}
		}


		return lcs[x.length()][y.length()];
	}

	/**
	 * 	This method will search through the dictionary using the least common substring
	 * 	and look for a replacement that is the closest in spelling. If more than one word is found
	 * 	it will replace it with the word closest to the beginning of the dictionary.
	 * 
	 * @param s
	 *            String to find closest match for
	 * @return
	 */
	String bestMatch(String s) {


		return null;
	}

	/**
	 * 	This method will accept a file and read through the file changing any spelling errors or spelling suggestions
	 * 	with the best match according to the bestMatch method. It will then output a file with the new suggestions
	 * 	and spelling corrections.
	 * 
	 * @param input
	 *            Name of the input file to be 'corrected'
	 * @param output
	 *            Name of the 'corrected' output file
	 * @throws FileNotFoundException
	 */
	void correctFile(String input, String output) throws Exception {
		/*		Scanner in = new Scanner(new File("input"));
		BufferedWriter out = new BufferedWriter(new FileWriter("output"));

		while(in.hasNext()){
			String word = in.next();

			bestMatch(word);
			out.write(word);
		}

		in.close();
		out.close();*/
	}

}



public class Main {

	/**
	 * @param args
	 */
	public static void main(String[] args) {
		try {
			SpellingCorrector theSpeller = new SpellingCorrector();

			theSpeller.loadDictionary("dictionary.txt");

			String x = "abcde";
			String y = "abcde";
			int length = theSpeller.getLcsLength(x, y);
			System.out.println(x + " -> " + y + " length = " + length);

			x = "kitten";
			y = "sitting";
			length = theSpeller.getLcsLength(x, y);
			System.out.println(x + " -> " + y + " length = " + length);

			x = "gattacca";
			y = "acgtacgt";
			length = theSpeller.getLcsLength(x, y);
			System.out.println(x + " -> " + y + " length = " + length);

			String nonWord = "nessicary";
			String match = theSpeller.bestMatch(nonWord);
			System.out.println("Best match for " + nonWord + " is '" + match
					+ "'.");

			nonWord = "nesiccary";
			match = theSpeller.bestMatch(nonWord);
			System.out.println("Best match for " + nonWord + " is '" + match
					+ "'.");

			nonWord = "necisary";
			match = theSpeller.bestMatch(nonWord);
			System.out.println("Best match for " + nonWord + " is '" + match
					+ "'.");

			String[] allWords = theSpeller.getDictionary();
			int whichWord = 12345;
			System.out.println("Word at position " + whichWord + " is '"
					+ allWords[whichWord] + "'.");
			whichWord = 123456;
			System.out.println("Word at position " + whichWord + " is '"
					+ allWords[whichWord] + "'.");
			whichWord = 49931;
			System.out.println("Word at position " + whichWord + " is '"
					+ allWords[whichWord] + "'.");


			theSpeller.correctFile("input.txt", "corrected.txt");
		} catch (Exception e) {
			e.printStackTrace();
		}
	}

}



Sorry I forgot to put what I am getting for my input.

This is what I am getting:

abcde -> abcde length = 1
kitten -> sitting length = 1
gattacca -> acgtacgt length = 1
Best match for nessicary is 'null'.
Best match for nesiccary is 'null'.
Best match for necisary is 'null'.
Word at position 12345 is 'basidium'.
Word at position 123456 is 'proteoglycan'.
Word at position 49931 is 'encased'.


Also the algorithm example didn't really show up in the post like I wanted it to but here it is again:

Just like before the options on the left are what is used to test the boolean, and the option on the right is what gets placed in the 2d array.

L[i][j] {

0 : j = 0 or i = 0

L[i-1][j-1] + 1 : x[i-1]=y[j-1]

Math.max(L[i][j-1],L[i-1][j]): x[i-1] != y[j-1]

Is This A Good Question/Topic? 0
  • +

Replies To: Least Common Substring Algorithm

#2 natecat  Icon User is offline

  • D.I.C Head

Reputation: 53
  • View blog
  • Posts: 225
  • Joined: 19-December 11

Re: Least Common Substring Algorithm

Posted 19 January 2013 - 04:48 PM

  
String bestMatch(String s) {
    return null;
}


Is this intentional?
Was This Post Helpful? 0
  • +
  • -

#3 bteeple  Icon User is offline

  • New D.I.C Head

Reputation: 0
  • View blog
  • Posts: 26
  • Joined: 01-May 11

Re: Least Common Substring Algorithm

Posted 20 January 2013 - 10:09 AM

At the moment it is intentional. In order for the bestMatch method to work, I first have to finished my getLcsLength method. As in the bestMatch method I have to call my getLcsLength method.

So to finish this program, it all comes down to my algorithm. :(
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1