**Tutorial: Advanced Python Generators**

Hey everybody, I've been in a snippet writing mood lately, and I started writing some generators. scalt's tutorial on generators is a nice introduction to generators, but this tutorial will go into more advanced use cases of generators. The focus of this tutorial will be generators involving enormously huge, and even infinite, sets.

Generators are great because they allow you to iterate over a potentially huge list of numbers without having to store the list in its entirety in memory. This quality can make generators very handy, particularly in resource critical situations. Consider this situation: you want to compute the sum of all values from 1 to 100,000,000. How would you do this? You might try this:

total = 0 for i in range(1,100000001): total += i

However, this lacks a certain elegance that we've come to expect in Python. For those of you more familiar with list comprehensions, you might think that using one is the way to go:

total = sum([i for i in range(1,100000001)])

But when I ran that, my computer slowed down to a crawl. Who would have thought allocating space in memory for one hundred million integers would slow things down? A better solution is necessary, enter the generator!

total = sum(i for i in range(1,100000001)) #With python 2.x, you'll want to use xrange()

By using a generator, we evaluate total 1 value at a time (lazily). By doing it this way, we don't have to allocate space for 100,000,000 integers in memory at the same time. Don't misunderstand, this code will take a while to compute (it did on mine at least), but, it's far faster and less resource intensive than the comprehension approach. My generator found the total in about 30 seconds on my machine, the comprehension has taken something around 10 minutes.

As you can see, generators provide an elegant solution to a real problem. However, this is not what I love most about generators. What I love most about generators is their utility in infinite sets. In the snippets section, you'll find quite a few generators for infinite sets that I've submitted so far. One tried and true snippet is the Fibonacci generator:

__author__ = "atraub" __date__= "6/13/2012" #optionally supply the first two values of generator to seed it def fibonacciGenerator(previous=0,current=1): """Creates a Fibonacci Generator""" #handle the first two values if previous > current: previous,current = current, previous yield previous yield current #handle all subsequent values while True: current,previous = previous+current,current yield current

This generator will infinitely generate fibonacci values. Here's an example of the Fibonacci generator in action (ran from the Python shell):

>>> x = fibonacciGenerator() >>> for i in range(20): print(next(x)) 0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181

In this example, we create a generator and then use it to find the first 20 Fibs. If we want the next 20, we can simply run the loop again and our generator will pick right back up where we left off. Another cool aspect of generators is that they can be paired with list comprehensions, as you can see in the following interactive session:

>>> x = fibonacciGenerator() >>> y = [next(x) for i in range(20)] >>> print(y) [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181]

This is pretty cool. Sometimes, a generator can't easily be expressed in a simple function. For this, Python allows us to make object oriented Generators. For this, I created a generator to create the values of the Kolakoski sequence.

__author__ = "atraub" __date__= "6/09/2012" class Kolakoski: """Class based generator for the Kolakoski sequence""" def __init__(self): """Sets up the generator""" self.sequence = [] self.i = 0 self.currentNum = 1 self.nextNum = 2 def __next__(self): """Implements next() in Python 3""" return self.__realNext() def next(self): """Implements next() in Python 2""" return self.__realNext() def __realNext(self): """Underlying function to get the next value in the sequence""" if self.i == 0: """Base case for first value""" retVal = [1] elif self.i == 1: """Base case for second value""" retVal = [2,2] else: """Default case for all values after the first 2""" retVal = self.__getNext() self.__update(retVal) return retVal def __getNext(self): """Gets the next value after the first two""" return [self.currentNum for a in range(self.sequence[self.i])] def __update(self,retVal): """Handles upkeep of the generator""" self.i += 1 #increments the index self.sequence.extend(retVal) #Adds newest value to underlying sequence self.currentNum,self.nextNum = self.nextNum,self.currentNum #sets the current number def __str__(self): """Creates a string representation of the underlying sequence""" return ",".join([str(a) for a in self.sequence]) def getLast(self): """gets the last value generated by the sequence""" return self.last

Object oriented generators aren't extremely tough. The one nuance is that Python3 requires a method called __next__ and python2 requires the method be named next. This isn't the time or place to walk through the code itself, but it's not overly complex once you have a firm understanding of how the Kolakoski sequence is generated. Despite the different appearance of this generator, calling on it is still syntactically identical. Unfortunately, the Kolakoski sequence still requires an underlying list to generate values, and as of this iteration I wasn't able to find a way to generate values without it, thus, this generator is not ideal for infinite sets. (On a side note, I did develop a version that could save memory by removing elements from the underlying list that were no longer necessary to generate new values, however, the nature of the sequence dictates that even with this optimization, the list will still grow infinitely huge).

I hope that through this tutorial, you've seen that generators do have a lot of power and utility. Whether you want to generate values from infinite sets or perform calculations on massive sets of values, a generator may be the solution you've been looking for.

This post has been edited by **atraub**: 29 July 2012 - 08:42 AM