Python List Comprehensions
JeanFrancoisPuget 2700028FGP Comments (4) Visits (5401)
Do you write loops over array indices in Python? This would not be surprising if you have good programming skill in programming languages such as C, C++, or Java. Issue is that loops over array indices really hurt performance in Python unless you compile your code with Cython or Numba.
Alright, you heard this before: one should not loop over array indices in Python.
What can you use then? Depending on the task to be performed, Python offers a variety of approaches. I would like to describe one that is underused IMHO, namely list comprehensions. I will also touch upon array slicing.
Let's start with a simple example. We create a list.
x = list(range(10)) print(x)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
I'm using Python 3.6 here, but the code could be run with Python 2.7, except for print where one needs to remove brackets.
Now, let's assume we want the square of the elements in our list. Python newcomers will probably write code similar to this:
def get_square_loop(x): x2 =  for xi in x: x2.append(xi*xi) return x2
yields what we want:
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
The above code is correct, but it is inefficient. List comprehensions are meant for case like this. The idea of comprehension is to move the for loop inside the construction of the result list:
def get_square_list(x): return [xi*xi for xi in x]
This is both simpler and faster than the previous code.
Let us now look at a more complex example. We want to define a function every_repeat(x, span, step) that takes 3 arguments. The function should return span consecutive elements from x starting at every multiple of step. For instance, starting from this list
x = list(range(1, 25)) prin
[1, 2, 3, 4, 5, 11, 12, 13, 14, 15, 21, 22, 23, 24]
List comprehensions can do the magic very elegantly:
import numpy as np def every_repeat(x, span, step): x = np.asarray(x) return [a for k in range(0, len(x), step) for a in x[k:k+span] ]
This code maybe be a little cryptic. Let's see how we could get to that code step by step. First, we know we will need to access some elements in the list, not all. Hence we will not use a simple list traversal. In this case, we need random access to list elements. In that case it is faster to use arrays than lists. That's what the first line in the body of the function does: it transforms the input list into a numpy array. numpy is the Python package for arrays.
Now that x is an array, we can access elements in it in any order (hence the name random access). If we use loops we could write code like this:
def every_repeat(x, span, step): x = np.asarray(x) out =  for k in range(0, len(x), step): for i in range(span): out.append(x[k+i]) return out
This looks good but it triggers an error when we call
Why? because we try to access the element of index 24. We must make sure we do not overflow the list. Or code becomes:
def every_repeat(x, span, step): x = np.asarray(x) out =  for k in range(0, len(x), step): for i in range(span): if k+i < len(x): out.append(x[k+i]) return out
This is correct, but frankly, it looks more like C code than Python. And it is inefficient. A first optimization is to replace the inner loop by array slicing. The list of the elements x[k+i] can be obtained by a slice of the array:
def every_repeat(x, span, step): x = np.asarray(x) out =  for k in range(0, len(x), step): for xi in x[k : k+span]: out.append(xi) return out
Notice how slicing avoid overflow transparently. Notice also that we no longer loop on the array via indices. We loop directly over array elements, which is faster.
Now we have a familiar pattern: we have two nested loops that add elements to a list one at a time. This is what list comprehensions are for. We simply need to move the loops inside the list creation, in the same order as they appear in the above code:
def every_repeat(x, span, step): x = np.asarray(x) return [ xi for k in range(0, len(x), step) for xi in x[k:k+span] ]
Next time you add elements to a list one by one think of list comprehensions, your code will be more readable, and more efficient.