I started working on a custom data structure recently. This structure would require some sort of underlying list or queue to be used. My goal was was to optimize insertion, pop, and peek to all be O(1). Unfortunately, the Python List and the Python Queue fail to fit the bill. While I bet Python has something in one of its libraries, what self respecting programmer can resist the urge to make something from scratch.
To fulfill the need, I didn't want to use any builtin Lists. Why? Because Lists allocate a certain amount of memory beneath the scenes. When you add in a value that won't fit, a nasty process of reallocating memory occurs. Since this is my first made-from-scratch Python data structure, I'm going to use a seemingly easy one to implement, and then discuss optimizing it in a later tutorial.
The structure that we'll be using is called a Singly Linked list. This implementation will be accomplished by creating Node objects. These Nodes will be simple. They will contain an element, and a link to the next node in line. With multiple Nodes, the structure will look something like this:
With a few accessors and mutators, we can create a simple Node class. I also create a __str__ method to simplify printing. All-together, here's what I have so far:
class LinkedList: '''Linked List''' class __Node: '''private node object. Node's consist of an element and a pointer to the next Node''' def __init__(self, element=None): self.element = element self.next = None def __str__(self): '''string representation of Node''' return str(self.element) def hasNext(self): '''returns true if the Node points to another Node''' return self.next != None def getNext(self): '''get next Node''' return self.next def getElement(self): '''get the element of the currentNode''' return self.element def setNext(self,nextItem): '''set the next Node of the currentNode''' self.next = nextItem def setElement(self, element): '''set the element of the current Node''' self.element = element
As you can see, I opted to make Node a private class inside of Linked List. Why Private? There's no reason the user should ever see 'Node' objects. The user does not need to know they exist! To make something private, just add two underscores to the front of it. This is an important feature of Python; we'll want to remember it.
Next on our list is initiating a Linked List. My approach is to always have an empty Node at the front, this Node is known as a sentinel. This ensures me that I never have to worry about some of the complications that an empty data structure could encounter. One other thought. How will we keep track of the length of this beast? While we could iterate through the list every time we need the size O(n), I think keeping a length attribute is the smarter decision O(1).
def __init__(self): '''Creates the LinkedList''' self.first = LinkedList.__Node() self.length = 0
So far, so good. While you might be tempted to go straight into writing the append code, I personally want to take care of some more house keeping first. To use python's len keyowrd, the class needs a function called __len__.
def __len__(self): '''returns the length of the list O(1)''' return self.length def isEmpty(self): '''returns true if the list is empty''' return self.length == 0
Not much glam, but this is where things get interesting. Given the nature of the Singly Linked list, adding an element to the front of it should be really easy. All we need to do is create a new Node with an element in it. The new Node should point to whatever self.front is pointing to, and the self.front needs to point at the new node. Also, we'll need to increment length.
def addToFront(self,element): '''Optimized method to add an element to the front of the list O(1)''' toAdd = LinkedList.__Node(element) toAdd.setNext(self.first.getNext()) self.first.setNext(toAdd) self.length += 1
At this point, I hope you're thinking to yourself, "this Linked List is awesome!" And why shouldn't you? So far, it can mimic pushing onto a stack with O(1) goodness. We could even implement a pop front very easily too:
def popFront(self): '''Optimized method to add pop front element from list O(1)''' if self.length > 0: val = self.first.getNext().getElement() self.first.setNext(self.first.getNext().getNext()) self.length -= 1 return val else: raise IndexError("pop from an empty List")
Aside from peeking, our linked list already fulfills all the requirements of a stack. As a practice exercise, you could implement your own peek algorithm. However, our goal wasn't to make a stack, it was to make a linked list. A more complicated feature would be appending. If we're adding to the end, all we have to do is keep looking at the next Node if there's a node following it. Notice this is an O(n) algorithm, we have to look at EVERY node in our list in order to append. Unfortunately, with the current specs of our Linked List, this is inevitable.
def append(self, element): '''Add element to last position of the list''' currentNode = self.first while currentNode.hasNext(): currentNode = currentNode.getNext() currentNode.setNext(LinkedList.__Node(element)) self.length+=1
There is a certain beauty in the simplicity of this code. This is where the simplicity starts to fade a bit. Now it's time to implement pop(). To pop an element, we must find it in the list. we must then make the previous element point to the element following the one we want to remove. We'll need to decrement length of course, and we want to return the value we popped. Once again, this method seems to grow at a rate of O(n).
def pop(self, index =None): '''Removes and returns an element in the list. last element by default.''' if len(self) == 0: raise IndexError("pop from empty list") #utilize default parameter if index ==None: index = len(self)-1 if type(index) != int: raise TypeError("Index must be an integer or a slice not a "+str(type(index)).split("'")) #If the index is out of bounds if index >= self.length or index < 0: raise IndexError("Index out of bounds") previous = self.first for i in range(index): previous = currentNode.getNext() toRemove = previous.getNext() afterNext = None if toRemove.hasNext(): afterNext = toRemove.getNext() previous.setNext(afterNext) self.length-=1 return toRemove.getElement()
The insert algorithm is going to handle very similarly, so for the sake of brevity you'll have to look in the spoiler tags to see it.At this point, I found myself needing to iterate the list to get to a specific node a lot, so I decided to make a private function to handle that for me.
def __getNodeAtPosition(self, index): '''(Private) Gets a Node at a given index''' currentNode = self.first for i in range(index+1):#Adds 1 to account for initial Node (sentinel) currentNode = currentNode.getNext() return currentNodeThis function reduces a lot of repeated code in insert, append, pop, and a few functions we haven't discussed yet.
I also found myself checking to see if a given index is valid a lot, so I implemented a private function to handle that too.
def __checkIndex(self, index): '''(Private) check if the index is an acceptable value.''' if type(index) != int: raise TypeError("Index must be an integer or a slice not a "+str(type(index)).split("'")) #If the index is out of bounds if index >= self.length or index < 0: raise IndexError("Index out of bounds")
An important method of this data structure is __getitem__(self, index). By implementing this function, your data structure can be indexed using the notation of myList[x] and it can also be used in for loops in the context of for index in myList. On the surface, implementing __getitem__ appears easy.
def __getitem__(self, index): '''Allows for indexing, index must be an integer''' self.__checkIndex(index) return self.__getNodeAtPosition(index).getElement()For simple tests this appeared to work. But what happens if we try to slice it? We get an error. A slice object is not an integer, therefore we're in trouble. Slicing one of Pythons builtin Lists yields a new List. Don't believe me? Try going into idle and typing the following:
>>> x = [1,2,3,4,5,6,7,8]
You'll find that x's value hasn't changed. This is due to the fact that when we sliced it, we created a new list. To implement slicing, you must realize that a slice object has a start, stop, and step attribute. You must also be aware that these attributes are read only. Seems easy right? Don't forget, you can leave any of these attributes as None (like my x[5:] example where stop and step are None) and function should behave properly. Ok, still not the hardest problem ever, right? It gets worse! Slices can utilize negative numbers. My solution for slicing looks like this:
def __getitem__(self, index): '''Allows for indexing, index must be an integer''' #accounts for slicing if type(index) == slice: return self.__sliceList(index) return self.__getNodeAtPosition(self.__checkIndex(index)).getElement() def __sliceList(self,theSlice): '''(Private) function to handle slicing. Returns a Linked List''' retList = LinkedList() if theSlice.step == None: step = 1 else: step = theSlice.step if theSlice.start == None: if step > 0: start = 0 else: start = self.length-1 else: start = theSlice.start if theSlice.stop == None: if step > 0: stop = self.length else: stop = -1 else: stop = theSlice.stop for eachItem in range(start,stop,step): retList.append(self.__getitem__(eachItem)) return retListPretty nasty, but it works. Unfortunately, due to our implementation, I believe slicing is an O(n**2) algorithm. You'll also notice that __checkIndex now returns a value. This was done to incorporate negative values. You might be thinking to yourself, "I can reduce some repeated code here" You're right. Check the code in the spoiler tags to see a much cleaner __sliceList function.
The last technique I'd like to point out was for the __str__ function. I wanted my __str__ function to mimic the one in Python's builtin List class. Here's how I did it
def __str__(self): '''returns a string representation of the list''' if self.length == 0: return '' retString = "[" currentElement = self.first.getNext() for i in range(self.length): retString += str(currentElement) +", " currentElement = currentElement.getNext() return retString[:-2] + ']'What I'd like to point out is that I add a comma and a space after every element, and I simply slice the last comma-space off at the end so that I don't have to use any annoying conditionals to try to determine which value is last.
There's a lot more to my Singly Linked List, but that's enough in-depth discussion of it for now. Feel free to look at the code in the spoiler tag to see the completed version. Notice that you can add 2 singly linked lists together, and that my linked list even supports addition to lists (although lists won't support adding my linkedList to them).
The code below has been further cleaned and optimized. Many of the above methods have been modified. For example, __checkIndex now correctly handles negatives. This is the final version of the Singly Linked List class unless I notice something horrendous. The next tutorial will cover optimization techniques and Doubly Linked Lists.
This post has been edited by atraub: 26 January 2011 - 07:36 AM