Page 1 of 1

List Comprehensions An Introduction

#1 andrewsw  Icon User is online

  • Build your own boat!
  • member icon

Reputation: 6176
  • View blog
  • Posts: 24,572
  • Joined: 12-December 12

Posted 26 October 2015 - 08:28 AM

As well as providing an introduction to list comprehensions, this tutorial also provides a walkthrough of a more complicated example. That is, I show a typical solution to a problem and then show the steps taken to write an alternative solution using a list comprehension.

docs said:

List comprehensions provide a concise way to create lists. Common applications are to make new lists where each element is the result of some operations applied to each member of another sequence or iterable, or to create a subsequence of those elements that satisfy a certain condition.

List Comprehensions :the docs

We can use the range() function to create a sequence of values, and a standard for-loop to iterate through a list:
for x in range(1, 4):
    print(x) # 1 2 3

for x in [1, 2, 3]:
    print(x) # 1 2 3


(Note that the second argument to range(), stop, is not included in the output.)

With a list comprehension we can create a list in-place. Here is our first, basic example to demonstrate the syntax:
print([x for x in range(1, 4)])
# [1, 2, 3]


docs said:

A list comprehension consists of brackets containing an expression followed by a for clause, then zero or more for or if clauses. The result will be a new list resulting from evaluating the expression in the context of the for and if clauses which follow it.

The expression between the square brackets creates a new list, a sequence, of values. (I prefer to call it a compound expression as it has a few parts.)

The phrase x for x in is a little confusing, and it doesn't help that we write it from left-to-right but interpret it (more or less) from right-to-left. But let's break it down.

Let's read it this way first, for x in range(1, 4). This is a familiar phrase. As with a standard for-loop, x becomes the placeholder for each value, each item, in the list (the range()).

x for x states what each value in our new list consists of. In this case, it just consists of each value x from the original range, unaltered. This becomes much clearer if we alter the returned values in some way:
print([2 * x for x in range(1, 4)])
# [2, 4, 6]


So, although x represents each value in the original range, it is twice this value that is added as an item to our new list.

Here is equivalent code using a list variable y:
y = [1, 2, 3]

print([x for x in y])
# [1, 2, 3]

print([2 * x for x in y])
# [2, 4, 6]


Here's an example converting a list of Celsius values to Fahrenheit:
fahrenheit = [ ((float(9)/5)* x + 32) for x in celsius ]




At this point I should note that although list comprehensions are clever and expressive, they are not essential; nor are they necessarily more efficient (although they can be) than standard for-loop solutions. They can obfuscate code that would be better split over several lines with standard loop structures (and comments). I provide a more complicated example further down, and show the long and short versions, so that you can compare them.

As always, use a feature like list comprehensions where appropriate and not because we can.



Now that we can create new, in-place, lists, we can nest them within functions such as sum:
print(sum([x for x in y])) # 6
print(sum(x for x in y)) # 6 (generator)


The second version, without square brackets, is a generator. See this tutorial:

Generators An Introduction

We can add an if clause (a predicate) at the end to filter values:
print([x for x in y if x > 1])
# [2, 3]


To interpret such an expression I suggest that you initially ignore the if-part. Once you understand the list that will be created, then interpret the if-part, realising that it will filter values from the for sequence. In our example, only those values from y that are greater than 1 will be processed/included.

Here's a text example, extracting numbers from a string:
sentence = "12 pints of lager and 5 bags of crisps"

numbers = [x for x in sentence.split() if x.isdigit()]

print(numbers)      # ['12', '5']


There can be more than one for clause, chaining sequences. The following creates a cross product with each fruit combined with each colour.
colours = ['red', 'green', 'blue']
fruits = ['apple', 'pear']

print([(x, y) for x in colours for y in fruits])
# [('red', 'apple'), ('red', 'pear'), ('green', 'apple'), ('green', 'pear'), ('blue', 'apple'), ('blue', 'pear')]
# cross product


PEP said:

The form [... for x... for y...] nests, with the last index varying fastest, just like nested for loops.

PEP 202

Here's an example which flattens a list of lists, together with its equivalent written with nested for-loops, which should make it easier to decipher:
a = [[1, 2, 3], [4, 5], [6], [7, 8, 9]]

print([x for y in a for x in y])
# [1, 2, 3, 4, 5, 6, 7, 8, 9]
newList = []
for y in a:
    for x in y:
       newList.append(x)

print(newList) # [1, 2, 3, 4, 5, 6, 7, 8, 9]


Comprehensions can also be (explicitly) nested, one within another. See here for example, and the docs.

There are also dictionary and set comprehensions. Once you understand list comprehensions, these are similar.



A More Detailed Example and Walkthrough

My interest in writing this tutorial was triggered by a recent question here at DIC, starting with this list:
places = [
    ('London', 62.66, 4.54), ('New York', 35.20, 78.75), ('Dublin', -65.04, 70.04),
    ('Tokyo', 131.26, 71.82), ('San Marino', 71.00, 42.00), ('Budapest', 7.10, -10.50),
    ('Dubai', -12.80, -6.93), ('Moscow', -57.94, 19.32), ('Munich', 43.81, -38.43),
    ('Helsinki', -71.00, -47.25), ('Kiev', -51.00, 26.25), ('Vienna', 39.05, -27.93)]


The pairs of numbers are (x, y) coordinates, and the question was to determine the distance covered between all the places, proceeding from London to New York to Dublin, etc., in order.

The (approximate) formula to determine the distance between two points is

Attached Image

which we write as a function:
import math

def calcDist(x1, y1, x2, y2):
    return math.sqrt(((x2-x1)**2)+((y2-y1)**2))


Here is my code that determines this distance without a list comprehension:
distance = 0.0
firstTime = True

for place in places:
    if firstTime:
        _, x1, y1 = place
        firstTime = False
        continue
    _, x2, y2 = place
    distance += calcDist(x1, y1, x2, y2)
    x1, y1 = (x2, y2)

print(distance) # 1011.8341311725462


We interate the list accumulating the distance. With the first iteration we just need to note the starting point (for London). For each subsequent iteration we need to remember the current end-point (x1, y1).

The use of the underscore is a common idiom in Python, denoting that we are not interested in that particular value (the name of each place).

Rather than just leaping in to solve this with a list comprehension, I broke the problem down starting with a much simpler list and function.
x = [('a', 1, 1), ('b', 2, 2), ('c', 3, 3), ('d', 4, 4)]

def mult(a, b, c, d):
    return a * b * c * d


The end result will be (1 * 1 * 2 * 2) + (2 * 2 * 3 * 3) + (3 * 3 * 4 * 4) = 184.

To iterate the list in pairs I first use list slicing to create two versions of the list:
print(x[:-1])
# [('a', 1, 1), ('b', 2, 2), ('c', 3, 3)]

print(x[1:])
# [('b', 2, 2), ('c', 3, 3), ('d', 4, 4)]


Now that I have these two versions of the list, I need to be able to iterate them in pairs. The zip() function makes "an iterator that aggregates elements from each of the iterables".
print(*zip(x[:-1], x[1:]))
# (('a', 1, 1), ('b', 2, 2)) (('b', 2, 2), ('c', 3, 3)) (('c', 3, 3), ('d', 4, 4))


The star * in front unpacks the zip so that I can produce the printed output, confirming that zipping has worked.

You can think of the process as a physical zip, where each part of the zip is merged, interleaved, as the zip is pulled.

Now we can create the list comprehension:
print(sum([mult(a, b, c, d) for (_, a, b ), (__, c, d) in zip(x[:-1], x[1:])]))
# 184


Again, reading from right to left, we know what zip() produces. Each pair of values it returns becomes (_, a, b ), (__, c, d). Again, I am discarding the place-names using underscores.

Now that we have the four values we need (a, b, c, d) we can mutiply them together mult(a, b, c, d). These multiples are then summed and, eventually, the result is printed.

(I could have re-used the underscore rather than doubling-up with __, although this does generate a minor warning.)

It is now just down to simple substitution to solve the original problem:
print(sum([calcDist(a, b, c, d) for (_, a, b ), (__, c, d) in zip(places[:-1], places[1:])]))
# 1011.8341311725462


Compare this with the earlier, more standard, code. Remember, I am not saying that the LC version is better than the original. It is more compact, but the original version is certainly easier to read and decipher.

This post has been edited by andrewsw: 28 October 2015 - 04:28 PM


Is This A Good Question/Topic? 0
  • +

Replies To: List Comprehensions An Introduction

#2 andrewsw  Icon User is online

  • Build your own boat!
  • member icon

Reputation: 6176
  • View blog
  • Posts: 24,572
  • Joined: 12-December 12

Posted 26 October 2015 - 09:36 AM

If it is of interest, there are other, and more efficient, ways to iterate a list in pairs, discussed here at SO.

Even with my approach zip(places, places[1:]) could be used instead of zip(places[:-1], places[1:]) as zip will ignore the extra (un-paired) item anyway, but I'm happy with my version.
Was This Post Helpful? 0
  • +
  • -

Page 1 of 1