for(i = 0; i < 10; i++)
Now imagine that instead of having to generate an entire list then iterate over it, you could just generate it on-the-fly as you iterate over it. This is what a ‘generator’ does. What does a generator look like, you say? Exhibit A:
def generate_range(lim): num = 0 while num < lim: yield num num += 1
for i in generate_range(10): print i
0 1 2 3 4 5 6 7 8 9
Cool eh? Notice anything fancy about the function? Something to do with the lack of a ‘return’ statement? That’s what the ‘yield’ is for . A ‘yield’ statement performs the same kind of function as a ‘return’, yet it is not terminal.
As far as you are concerned, the loop calls your function which executes as normal until the ‘yield’ bit, at which point it returns the value specified by the ‘yield’, and then pauses! This value is then assigned to ‘i’ (in the calling ‘for’ loop) and the calling loop goes about its merry way, in this case printing ‘i’. When the calling loop finishes its current iteration it goes back to your paused generator, which resumes from the ‘yield’ statement and runs until it either hits a ‘yield’ again, or reaches the end of the function. If it hits a ‘yield’ then the above process repeats itself, however if the code exits the calling loop simply behaves as though it has reached the end of a normal ‘list’ and stops, allowing the code to continue on to the next bit.
A practial example of why you may want to use this is to get a list of the contents of a directory. Instead of building your own loop each time you want to do an 'os.walk' (incidentaly this is also a generator function), you can simply make a generator function and iterate through it with a far simpler loop - really handy if you have a module you use to hold little functions like this becuase you only ever need to write it once!
Because it does things on the fly this can have performance (speed and memory) advantages because it doesn't need to read the ENTIRE directory tree into a list first. The tradeoff with this is that using a generator is a 1-way trip. There is no way to go back to a previous value unless you restart the whole thing again (or write some fancy wrapper code that stashes the values from your generator in a list).
import os #generator def getallfiles(folder): for path, dirlist, filelist in os.walk(folder): for fn in filelist: yield os.path.join(path,fn) #useage for fn in getallfiles('C:\\myfolder'): print fn
The 'for' loop here will simply go and print all the file names returned by your generator (as you can see it is looking inside 'C:\\myfolder')
As well as full-fledged functions, you can also build generator 'one-liners'. These are really cool because you can easily, start plugging them together to do tasks such as filter the files provided by the function above, plus they tend to read far easier than a standard loop. The following is a very basic example that ends up returning all '.txt' files whose filenames (incl folder names) do not contain the letter 'a' (usually you would use 'fnmatch' to do this but I'm trying to keep it simple):
import os #generator def getallfiles(folder): for path, dirlist, filelist in os.walk(folder): for fn in filelist: yield os.path.join(path,fn) endwith_txt = (fn for fn in getallfiles('C:\\myfolder') if fn.endswith('.txt')) excluding_a = (fn for fn in endwith_txt if fn.count('a') is 0) for fn in excluding_a: print fn
The cool thing about this is that the generators form a 'pipeline', so even when you are setting up the last 2, which refer to other generators, nothing is actually run until you start iterating over 'excluding_a' in your for loop, at which point the collection of generators crank into action and begin spitting out file names.
The applications for generators are endless, an example project I did was to get all the 'txt' files from a folder and go through them line-by-line, weeding out and processing any line that started with 'date:', breaking them up by ',' and summing the 2nd column. ie:
import os #list files def getallfiles(folder): for path, dirlist, filelist in os.walk(folder): for fn in filelist: yield os.path.join(path,fn) #given a generator that yields open file objects (open('filename')), # yield the lines from those files def catallfiles(openfile_gen): for openfile in openfile_gen: for line in openfile: yield line #get all files files = getallfiles('C:\\benj\\temp') #filter files (*.txt) files = (fn for fn in files if fn.endswith('.txt')) #open files openfiles = (open(fn) for fn in files) #get all lines lines = catallfiles(openfiles) #filter lines (line starts with 'date:' only) lines = (line for line in lines if line[:5] == 'date:') #split lines data = (line.split(',') for line in lines) #get 2nd col ('1th') and convert to int data = (int(split) for split in data) #hey look, 'sum' consumes any iterable (not just a list)! print sum(data)
Or, if you want to 'simplify' it a little:
import os #list files def getallfiles(folder): for path, dirlist, filelist in os.walk(folder): for fn in filelist: yield os.path.join(path,fn) #given a generator that yields open file objects (open('filename')), # yield the lines from those files def catallfiles(openfile_gen): for openfile in openfile_gen: for line in openfile: yield line #find .txt files and open them openfiles = (open(fn) for fn in getallfiles('C:\\benj\\temp') if fn.endswith('.txt')) #string their contents together, only interested in line starting with 'date:' lines = (line for line in catallfiles(openfiles) if line[:5] == 'date:') #split lines and convert 2 col to int (returning only that) data = (int(line.split(',')) for line in lines) #tada! print sum(data)
Done! Hopefully this was vaguely informative and not utterly confusing, any questions just leave me a comment!