Subscribe to Martyr2's Programming Underground        RSS Feed
***** 1 Votes

Intersection of LinkedLists Using Java

Icon 3 Comments
Yesterday a user came through the Java forum looking for help with finding the intersection of two LinkedList sets. If you were anything like I was in school, you probably had a flash back to math class and the teacher drawing Venn Diagrams up on the white board. Two circles, usually representing two sets of elements, and where they overlapped was elements shared between the sets. Maybe they even drew 3 or 4 circles and did the whole shading and crosshatching thing. You probably figured "Where the hell am I going to use this?" Well today my friend is where learning it will help you understand a programming solution. Learn how to find which elements share LinkedList sets on this entry of the Programming Underground!

<That song about loving the sound of broken glass from the movie "Little Monsters"... by far Fred Savage's finest work *chuckle*>

Now imagine you have two sets of numbers. One is 1, 2, 3 and 4. The other set is 2, 4, 6, 8, and 10. Perhaps you want to find out which elements from set A are also found in set B. Just by looking at the sets you can see that the element 4 is in both sets. Pretty simple to see that right? Well imagine that these lists were hundreds or even thousands of items long. In addition to that, imagine that you don't really know which items are in the sets. Perhaps you generated the sets based on some sort of criteria. Perhaps the elements in each set are more complex than just counting numbers. You can quickly see where our problems lie. One, we don't have the time to go through thousands of items in the list looking for what they share. Two, if we don't exactly know what are in the sets, how can we find matches? Three, what if the elements were not just numbers, but a custom object you created? How would you go about finding matching dohickies?

Despite these little problems, lets see what we do know about sets. Well, we know that the most elements the result can have is the number of elements of the smaller set. Taking our example at the top, we know the most the result set will have is 4 elements. That is because set A has only 4 elements. This would mean that set A's elements are going to be completely inside set B's elements. In other words, set A is a subset of set B (or the same set if the two sets had the same number of elements).

Simple enough right? Ok, the original poster was asking about LinkedLists which contain integers only. We are going to go a bit beyond that and we are going to devise a small algorithm which can take two LinkedLists full of ANY kind of objects and find out which elements they share. Our only requirement is that the elements tell us how they can be compared to one another. In other words, the objects must implement the Comparable interface. For those who don't know about the Comparable interface, this interface says that any object which implements the interface must create a method called "compareTo" in their implementation. This method then must be able to compare itself with another element of the same type to tell us equality. For instance, the String class implements Comparable, so it has a method called compareTo which is used to compare one instance of the String class against another instance of the String class for equality. The method simply returns -1, 0 or 1 for less than, equal to, or greater than respectively. The interface is a "contract" which guarantees that any object that implements it will have the compareTo functionality to tell us equality.

We need this ability so that we can loop through one list, element by element, and look for it in the second list. Sounds pretty straight forward a nested for loop should do the trick. So lets dive right in and create a version for Strings, Integers, perhaps a custom class or two etc. If you are saying that to yourself, I suggest you hold up one second. This might be a great situation for a little Generics!

I won't go into generics too much here, but think of it as a way to make your code, well, generic. One function that could work on Strings and then turn around and be used for Integers and then whipped into working for custom objects. All this using ONE version of the algorithm. As long as the objects implement this Comparable interface, our generic algorithm should work perfectly despite the type used at that particular moment.

Now that we know the general pattern of needing a nested loop, which loops through one set and looks for matching elements in the other set, we can boilerplate design. In addition, we also know that a generic solution would be ideal for flexibility and work on multiple objects. When we are done we could also put this piece of code in our personal libraries to use for multiple projects. Sounds like a better idea than JackOfAllTrades dressing up like Blowup Doll Suzie for Halloween!

Lets show the code and then dive into the pieces afterwards...

import java.util.*;

public class Intersection {

	// Static method which takes two LinkedLists of objects (that implement the Comparable interface) and find any intersecting elements.
	// Returns a LinkedList of elements which intersect both sets.
	public static <T extends Comparable<T>> LinkedList<T> getIntersection(LinkedList<T> SetA, LinkedList<T> SetB) {
	
		// New results LinkedList
		LinkedList<T> results = new LinkedList<T>();
		
		// Loop through the first set using an iterator. 
		for (Iterator<T> it = SetA.iterator(); it.hasNext(); ) {
			T item1 = it.next();
			
			// Look for each element of the first set in the second.
			for (Iterator<T> it2 = SetB.iterator(); it2.hasNext(); ) {
				T item2 = it2.next();
				
				// Compare if the two are equal, then add them to result if they are.
				if (item1.compareTo(item2) == 0) {
				   results.add(item1);
				   break;
				}
			}
		}
		
		return results;
	}
}



Wow, lots of stuff going on here and maybe the generics in it also has you a bit shaky. No worries, take out the comments and you quickly see this thing is much smaller. Also understanding what we are doing with the Type parameters here and you will quickly see the light at the end of the tunnel. First off, we see that this is a static method called "getIntersection" which is a method of our class "Intersection". I made this static because all we really need here is two lists to compare and that itself doesn't require us to create an instance of the Intersection object to accomplish. Just like the sqrt() function doesn't need us to create a "Math" object to use.

The second thing you may notice is that we take in two LinkedLists called SetA and SetB. But what is with this <T> garbage? Well this is where the generics come in. It is called a Type Parameter and essentially it takes on the type of whatever object is passed to the function. If we pass in two LinkedLists of type String (LinkedList<String>) then T becomes "String" and everywhere you see T you could replace it with the word String (in your mind). If we pass it two Lists of Integer, T becomes Integer and you can replace T with the type Integer. Not too bad there right? Pretty simple. The last bit of generics here that we are going to explain is <T extends Compareable<T>> Again, T is going to take on the type passed to the function, that part hasn't changed. So if we have a String list, the statement will act like <String extends Comparable<String>>. This is essentially saying that all data types we pass to the function, in this case String, must implement the Comparable interface. The keyword "extends" here isn't necessarily the same version you may know from inheritance. It can mean inherits, but in this case it can also mean implements. You use the keyword "extends" in both situations, for objects and for interfaces when it comes to generics. All objects passed to our method must have a compareTo method which tells us how to compare itself with other objects of the same type.

Hopefully that was as clear as mud. Now lets try out a simple program which uses this class and show how this bad boy works...

import java.util.*;

public class IntersectionTest {
	public static void main(String args[]) {
	
		// To LinkedList sets of Integer objects (not int, int is a primative, Integer is an object)
		LinkedList<Integer> Set1 = new LinkedList<Integer>();
		LinkedList<Integer> Set2 = new LinkedList<Integer>();
		
		// Fill sets with some data
		Set1.add(1);
		Set1.add(4);
		Set1.add(3);
		
		Set2.add(3);
		Set2.add(2);
		Set2.add(4);
		
		// Call our method and give it the two LinkedLists, it returns a result LinkedList with elements that are in both sets.
		LinkedList<Integer> results = Intersection.getIntersection(Set1, Set2);
		
		// Loop through and print elements in result. We should see both elements 4 and 3 share both sets.
		for(Integer i : results) {
			System.out.println(i);
		}
		
		
		// Lets create two more sets, this time with a custom object we made called Item. Its declaration is below. 
		// Notice it implements the Comparable interface and thus has the compareTo object which takes in another item and essentially compares names for equality.
		
		LinkedList<Item> ItemSet1 = new LinkedList<Item>();
		LinkedList<Item> ItemSet2 = new LinkedList<Item>();
		
		// Fill sets with some data
		ItemSet1.add(new Item("Bannana"));
		ItemSet1.add(new Item("Watermelon"));
		ItemSet1.add(new Item("Strawberry"));
		
		ItemSet2.add(new Item("Plum"));
		ItemSet2.add(new Item("Strawberry"));
		ItemSet2.add(new Item("Tangerine"));
		ItemSet2.add(new Item("Grape"));
		
		// Again, call the SAME function but giving it two sets of our new item
		LinkedList<Item> resultsItems = Intersection.getIntersection(ItemSet1, ItemSet2);
		
		// Loop through and print elements in result. We should see only one element shares, "Strawberry".
		for(Item i : resultsItems) {
			System.out.println(i);
		}
		
	
	}
}


// Our custom class which implements the Comparable interface for Items
class Item implements Comparable<Item> {
	private String _name;
	
	public Item(String name) {
		_name = name;
	}
	
	public String toString() {
		return _name;
	}
	
	public int compareTo(Item anotherItem) {
		return _name.compareTo(anotherItem.toString());
	}
}



As you can see from our little bit of a rambling example that I first create two LinkedLists of Integer objects. I then give them to our static method (remember you call static methods by using the class name and then the method, no need to instantiate) and it returns results of the same type as what we passed in. The beauty here is that if we gave it LinkedLists of different types, SetA was Integers and SetB was Strings for instance, we would get a compile time error which is great. We don't want those errors to appear at runtime, so good to catch that error early!

After we run the first intersection, we print the results which happens to be the numbers 4 and 3. We then switch gears and create two more LinkedLists which feature our own custom object called "Item". This class implements the Comparable interface for comparing Items with one another and thus has the compareTo method. This method basically compares their names.... nothing fancy there. If our object didn't implement this interface, we would get another compile time error. We run the results through and this time only "Strawberry" appears in both sets. The results are then printed out again to show that. Here we just used the same algorithm for two different sets of objects (Integers and Items) and it gave us the elements which are contained in both sets in both situations. Pretty nifty.

Now before you go running off and "generifying" everything, keep in mind that making things too generic can also be harmful. I personally recommend using generics in situations where you really do need the flexibility and to cut down on repeat code. If you find yourself using generics everywhere just to get by with your solutions, perhaps you should take another look at how you are structuring the overall design. I have built a lot of software in my time and rarely do I find a need to use extensive generics. They have their niche and are useful for some solutions, but too much of a good thing is just as bad. Keep code short, simple and readable. That should always be your top goal.

Thank you for reading my blog! :)

If you want more blog entries like this, check out the official blog over on The Coders Lexicon. There you will find more code, more guides and more resources for programmers of all skill levels!

3 Comments On This Entry

Page 1 of 1

Dogstopper Icon

05 November 2009 - 02:03 PM
That was a really great entry! I learned a lot! Thanks!
0

cfoley Icon

09 November 2009 - 05:41 PM
That's pretty cool. I love how you tie in several concepts into one really simple tutorial. This can easily be done without generics and the Comparable interface... but including them shows them off naturally in a practical setting. Bravo!

May I make one suggestion though: that you tweak the tutorial to work with sets? Your algorithm is great for sets since they cannot contain duplicates but lists can contain duplicates so a slightly different algorithm is needed.

For instance, take the following two lists: (1, 1, 1, 1, 2} and {2, 2, 1}. We can see the intersection is {1, 2} but the set intersection algorithm gives [1, 1, 1, 1, 2] or [2, 2, 1] depending on the order of the arguments.
0

Martyr2 Icon

10 November 2009 - 10:09 PM
You bring up some valid points Cfoley. It does work more precisely with sets than lists due to the fact that lists can contain duplicates. I think I just didn't go far enough with comparison checking between the lists, but do believe the theory is still sound. The problem with the duplicates is that the algorithm doesn't see the difference between one instance of 1 and another instance of 1. I didn't want to go too deep into that idea of the comparisons for fear of losing people. Now if the comparison was working on comparing the exact instance of an object rather than just the simple value of a number or string, it would certainly be fine. But yes, sets would be much better match for my current example.

Another note, I don't really believe you can achieve the level of flexibility I am doing here without the generics. There are certainly methods out there for intersections of collections on standard types, but it will still require some level of generics and comparable interface to compare more complex types. I also believe it is required in order to handle custom objects since Object itself can't truly compare objects beyond equals(). You must provide implementation that tells how Java is to make comparisons between objects you made and only you, the designer, could provide that information. Otherwise it will try to make a specific object comparison rather than comparing the object values themselves.

But again, you are certainly right about the lack of depth on my part for comparing two objects for equality. If your lists are going to have duplicates, then you will need to either 1) Keep track of values already found, 2) Remove duplicates from the lists before you go finding the intersections or 3) Make more specific matches (like matching on multiple fields) to avoid the ambiguity of two values.

Thanks for the comments. :)
0
Page 1 of 1