Page 1 of 1

Generators An Introduction

#1 scalt  Icon User is offline

  • D.I.C Regular
  • member icon

Reputation: 63
  • View blog
  • Posts: 342
  • Joined: 22-November 07

Posted 02 November 2010 - 05:52 PM

Iteration over a group of values – be it a ‘list’, ’dictionary’, string, file, etc – is a fairly standard procedure in pretty much any programming language. This is particularly true of Python where ‘for’ loops are built on the concept of a ‘for each element in list’ style of loop rather than the traditional
 for(i = 0; i < 10; i++) 


Now imagine that instead of having to generate an entire list then iterate over it, you could just generate it on-the-fly as you iterate over it. This is what a ‘generator’ does. What does a generator look like, you say? Exhibit A:
def generate_range(lim):
	num = 0
	while num  < lim:
		yield num
		num += 1



called thus:

for i in generate_range(10):
	print i



giving:

	
0
1
2
3
4
5
6
7
8
9



Cool eh? Notice anything fancy about the function? Something to do with the lack of a ‘return’ statement? That’s what the ‘yield’ is for ;) . A ‘yield’ statement performs the same kind of function as a ‘return’, yet it is not terminal.

As far as you are concerned, the loop calls your function which executes as normal until the ‘yield’ bit, at which point it returns the value specified by the ‘yield’, and then pauses! This value is then assigned to ‘i’ (in the calling ‘for’ loop) and the calling loop goes about its merry way, in this case printing ‘i’. When the calling loop finishes its current iteration it goes back to your paused generator, which resumes from the ‘yield’ statement and runs until it either hits a ‘yield’ again, or reaches the end of the function. If it hits a ‘yield’ then the above process repeats itself, however if the code exits the calling loop simply behaves as though it has reached the end of a normal ‘list’ and stops, allowing the code to continue on to the next bit.

A practial example of why you may want to use this is to get a list of the contents of a directory. Instead of building your own loop each time you want to do an 'os.walk' (incidentaly this is also a generator function), you can simply make a generator function and iterate through it with a far simpler loop - really handy if you have a module you use to hold little functions like this becuase you only ever need to write it once!

Because it does things on the fly this can have performance (speed and memory) advantages because it doesn't need to read the ENTIRE directory tree into a list first. The tradeoff with this is that using a generator is a 1-way trip. There is no way to go back to a previous value unless you restart the whole thing again (or write some fancy wrapper code that stashes the values from your generator in a list).

import os

#generator
def getallfiles(folder):
    for path, dirlist, filelist in os.walk(folder):
        for fn in filelist:
            yield os.path.join(path,fn)

#useage
for fn in getallfiles('C:\\myfolder'):
    print fn



The 'for' loop here will simply go and print all the file names returned by your generator (as you can see it is looking inside 'C:\\myfolder')

As well as full-fledged functions, you can also build generator 'one-liners'. These are really cool because you can easily, start plugging them together to do tasks such as filter the files provided by the function above, plus they tend to read far easier than a standard loop. The following is a very basic example that ends up returning all '.txt' files whose filenames (incl folder names) do not contain the letter 'a' (usually you would use 'fnmatch' to do this but I'm trying to keep it simple):

import os

#generator
def getallfiles(folder):
    for path, dirlist, filelist in os.walk(folder):
        for fn in filelist:
            yield os.path.join(path,fn)

endwith_txt = (fn for fn in getallfiles('C:\\myfolder') if fn.endswith('.txt'))
excluding_a = (fn for fn in endwith_txt if fn.count('a') is 0)

for fn in excluding_a:
    print fn



The cool thing about this is that the generators form a 'pipeline', so even when you are setting up the last 2, which refer to other generators, nothing is actually run until you start iterating over 'excluding_a' in your for loop, at which point the collection of generators crank into action and begin spitting out file names.

The applications for generators are endless, an example project I did was to get all the 'txt' files from a folder and go through them line-by-line, weeding out and processing any line that started with 'date:', breaking them up by ',' and summing the 2nd column. ie:

import os

#list files
def getallfiles(folder):
    for path, dirlist, filelist in os.walk(folder):
        for fn in filelist:
            yield os.path.join(path,fn)

#given a generator that yields open file objects (open('filename')),
#   yield the lines from those files
def catallfiles(openfile_gen):
    for openfile in openfile_gen:
        for line in openfile:
            yield line

#get all files
files = getallfiles('C:\\benj\\temp')
#filter files (*.txt)
files = (fn for fn in files if fn.endswith('.txt'))

#open files
openfiles = (open(fn) for fn in files)

#get all lines
lines = catallfiles(openfiles)
#filter lines (line starts with 'date:' only)
lines = (line for line in lines if line[:5] == 'date:')

#split lines
data = (line.split(',') for line in lines)
#get 2nd col ('1th') and convert to int
data = (int(split[1]) for split in data) 

#hey look, 'sum' consumes any iterable (not just a list)! 
print sum(data)



Or, if you want to 'simplify' it a little:

import os

#list files
def getallfiles(folder):
    for path, dirlist, filelist in os.walk(folder):
        for fn in filelist:
            yield os.path.join(path,fn)

#given a generator that yields open file objects (open('filename')),
#   yield the lines from those files
def catallfiles(openfile_gen):
    for openfile in openfile_gen:
        for line in openfile:
            yield line

#find .txt files and open them
openfiles = (open(fn) for fn in getallfiles('C:\\benj\\temp') if fn.endswith('.txt'))

#string their contents together, only interested in line starting with 'date:'
lines = (line for line in catallfiles(openfiles) if line[:5] == 'date:')

#split lines and convert 2 col to int (returning only that)
data = (int(line.split(',')[1]) for line in lines)

#tada!
print sum(data)




Done! Hopefully this was vaguely informative and not utterly confusing, any questions just leave me a comment!

Is This A Good Question/Topic? 3
  • +

Replies To: Generators

#2 CheckersW  Icon User is offline

  • D.I.C Head
  • member icon

Reputation: 12
  • View blog
  • Posts: 198
  • Joined: 04-April 09

Posted 01 March 2011 - 09:54 PM

Really well explained and nicely applied. And THANK YOU for commenting on your code!
Was This Post Helpful? 0
  • +
  • -

#3 skorned  Icon User is offline

  • New D.I.C Head

Reputation: 13
  • View blog
  • Posts: 41
  • Joined: 30-August 08

Posted 02 June 2011 - 10:28 AM

This just blew my mind. I had no idea 'yield' existed. Is it considered good programming practice, or is it like goto? If your code doesn't call the paused function again, does the function remain paused on the stack forever?
Was This Post Helpful? 0
  • +
  • -

#4 scalt  Icon User is offline

  • D.I.C Regular
  • member icon

Reputation: 63
  • View blog
  • Posts: 342
  • Joined: 22-November 07

Posted 02 June 2011 - 02:44 PM

Using 'yield' is good programming practice, provided you use it in the right situations. Because methods with 'yield' in them are essentially iterators I would only use them in circumstances where an iterator makes sense. Usually (but not always) this would involve processing a list of values, or a file/database/whatever.

If you don't 'finish' your generator (ie don't run it through to the end) you can manually stop it so it doesn't hang around anymore using the 'close()' method (automatically built into all generators functions by Python), ie
def mygen():
    for i in range(100):
        yield i

g = mygen()

for i in g:
    if i < 10:
        print i
    else:
        g.close()

print "Done"



In this case the above loop will print the numbers 0 - 9 then exit. By calling 'g.close()' you are essentially telling 'g' to behave as though it has finished, which includes telling it to forget where it was up to and just skip to the end. The next time the loop comes around, 'g' will throw its 'StopIteration' exception which is automatically handled for you by the 'for' loop. The loop will just behave as though 'g' was a list of length '10' and your next bit of code will execute (in this case - 'print "Done"').

'.close()' will also work for 1-line generators as well, ie
g = (i for i in range(100))


Was This Post Helpful? 0
  • +
  • -

#5 atraub  Icon User is offline

  • Pythoneer
  • member icon

Reputation: 758
  • View blog
  • Posts: 2,010
  • Joined: 23-December 08

Posted 10 May 2012 - 09:15 AM

I decided to write a tutorial that would accompany this one, I'll link it when it's done.

This post has been edited by atraub: 10 May 2012 - 03:53 PM

Was This Post Helpful? 0
  • +
  • -

#6 k3y  Icon User is offline

  • D.I.C Head

Reputation: 36
  • View blog
  • Posts: 205
  • Joined: 25-February 12

Posted 12 July 2012 - 07:14 PM

Wow, that was an awesome tutorial. I didn't read the last half but the first half was great. I plan on reading more, after I do some other shenanigans.
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1