Subscribe to Stuck in an Infiniteloop

## An Application of Suffix Arrays

I was on CodeAnthem the other day and the following query was presented:

Quote

Given a string, return the longest repeating substring, i.e. for the string "When life hands you lemonade, make lemons", the longest repeating substring is "lemon".

(More accurately it is " lemon", but for the purposes of this post we'll ignore surrounding white space, more on this later).

So how does one approach this problem? My research led me to conclude that the best avenue of attack was a suffix array.

Algorithm:
1. Create an array of every suffix of the given string
2. Sort
3. Iterate through the array comparing the current index to the previous index
4. Assemble a temporary string, continue until the characters do not match, keep a track of the longest substring thus far
5. If the length of the assembled string exceeds the counter add it to a collection of "possibles"
6. Return the longest string out of the possibles.

Quick sidebar: The problem might specify that the repeating substring must not overlap, but that's not a specification here, so the longest repeating substring is "ana" rather then "an" or "na".

Create a suffix array:

The easiest way to do this is to simply strip off the leading character and store that substring:

```public static String findLongestSubstring(String value) {
String[] strings = new String[value.length()];
String longestSub = "";

//strip off a character, add new string to array
for(int i = 0; i < value.length(); i++){
strings[i] = new String(value.substring(i));
}
//debug/visualization
//before sort
for(int i = 0; i < strings.length; i++){
System.out.println(strings[i]);
}
//...continued later

```

Banana's suffix array:

Quote

banana
anana
nana
ana
na
a

This in and of itself isn't really helpful, but once we sort it...

Sort:

```//let's not reinvent the wheel
Arrays.sort(strings);
System.out.println();

//debug/visualization
//after sort
for(int i = 0; i < strings.length; i++){
System.out.println(strings[i]);
}

```

Banana's suffix array becomes:

Quote

a
ana
anana
banana
na
nana

Now that it's sorted, repeating sections of the string become more apparent.

Iteration and comparison:

Since we don't need to verify or check overlap all we need to check is the current index and the one that precedes it.

```Vector<String> possibles = new Vector<String>();
String temp = "";
int curLength = 0, longestSoFar = 0;

/*
* now that the array is sorted compare the letters
* of the current index to those above, continue until
* you no longer have a match, check length and add
* it to the vector of possibilities
*/
for(int i = 1; i < strings.length; i++){
for(int j = 0; j < strings[i-1].length(); j++){
if (strings[i-1].charAt(j) != strings[i].charAt(j)){
break;
}
else{
temp += strings[i-1].charAt(j);
curLength++;
}
}
//this could alleviate the need for a vector
//since only the first and subsequent longest
//would be added; vector kept for simplicity
if (curLength >= longestSoFar){
longestSoFar = curLength;
}
temp = "";
curLength = 0;
}

```

Search possibles:

We now have some possibilities for longest substring. Iterate through the vector and grab the longest one. We are concerned with whitespace up until the end, so trim() the result and return:

```//iterate through the vector to find the longest one
int max = 0;
for(int i = 0; i < possibles.size();i++){
//debug/visualization
System.out.println(possibles.elementAt(i));
if (possibles.elementAt(i).length() > max){
max = possibles.elementAt(i).length();
longestSub = possibles.elementAt(i);
}
}
System.out.println();
//concerned with whitespace up until this point
// "lemon" not " lemon" for example
return longestSub.trim();
}

```

Output:

Quote

//vector contents
a
ana

//end result
ana

Full code with several examples:
(added more print statements for visualization)

```public class Test{
public static void main(String[] args){
System.out.println(findLongestSubstring("i like ike"));
System.out.println();
System.out.println();
System.out.println(findLongestSubstring("When life hands you lemonade, make lemons"));
System.out.println();
System.out.println();
System.out.println(findLongestSubstring("banana"));
}
public static String findLongestSubstring(String value) {
String[] strings = new String[value.length()];
String longestSub = "";

//strip off a character, add new string to array
for(int i = 0; i < value.length(); i++){
strings[i] = new String(value.substring(i));
}

//debug/visualization
//before sort
for(int i = 0; i < strings.length; i++){
System.out.println(strings[i]);
}

Arrays.sort(strings);
System.out.println();

//debug/visualization
//after sort
for(int i = 0; i < strings.length; i++){
System.out.println(strings[i]);
}

Vector<String> possibles = new Vector<String>();
String temp = "";
int curLength = 0, longestSoFar = 0;

/*
* now that the array is sorted compare the letters
* of the current index to those above, continue until
* you no longer have a match, check length and add
* it to the vector of possibilities
*/
for(int i = 1; i < strings.length; i++){
for(int j = 0; j < strings[i-1].length(); j++){
if (strings[i-1].charAt(j) != strings[i].charAt(j)){
break;
}
else{
temp += strings[i-1].charAt(j);
curLength++;
}
}
//this could alleviate the need for a vector
//since only the first and subsequent longest
//would be added; vector kept for simplicity
if (curLength >= longestSoFar){
longestSoFar = curLength;
}
temp = "";
curLength = 0;
}

System.out.println();
//debug/visualization
System.out.println("Longest string length from possibles: " + longestSoFar);

System.out.println();
//iterate through the vector to find the longest one
int max = 0;
for(int i = 0; i < possibles.size();i++){
//debug/visualization
System.out.println(possibles.elementAt(i));
if (possibles.elementAt(i).length() > max){
max = possibles.elementAt(i).length();
longestSub = possibles.elementAt(i);
}
}
System.out.println();;
//concerned with whitespace up until this point
// "lemon" not " lemon" for example
return longestSub.trim();
}
}

```

Output sans debug printing:

Quote

ike
lemon
abra
ana

--
Happy coding!

### 4 Comments On This Entry

Page 1 of 1

#### Martyr2

07 September 2010 - 09:50 AM
Very very nice. I must say that it is a beautiful and elegant implementation. Looks like it could be easily ported over to another language too. All around A+.

0

#### Mercurial

09 September 2010 - 12:50 AM
Nice solution, I like it.
0

#### Tuishimi

11 September 2010 - 08:58 AM
"Return the longest, repeating substring". It is nice, but aren't you reusing letters in "banana"? Shouldn't the longest be "an" or "na"?
0

#### KYA

11 September 2010 - 01:06 PM
@Tuishimi "ana" repeats if you don't count the overlapping 'a'. One could avoid overlapping with a quick conditional while checking for letter patterns

@Martyr2 and Mercurial Thanks!
0
Page 1 of 1

S M T W T F S
1234567
891011121314
151617181920 21
22232425262728
2930