Understanding a new concept
The idea of iterators was introduced into Python with version 2.2. Well, that is not quite true; hints of the idea were already present in the older function
xrange(), and the file method
.xreadlines(). Python 2.2 generalized the notion in much of its internal implementation and made programming custom iterators much more straightforward by introducing the
yield keyword (the presence of
yield turns a function into a generator, which in turn returns an iterator).
The motivation behind iterators is twofold. Working with data as sequences is often the most straightforward approach, and a sequence that is processed in linear order often does not need to actually exist all at once.
x*() premonitions provide clear examples of these principles. If you want to do something a billion times, your program will probably take a while to execute, but there is no general need for it to claim a lot of memory. Likewise, for many types of files, processing can be a line-by-line matter, and there is no need to store the whole file in memory. All sorts of other sequences can best be treated lazily too; they might depend on data arriving incrementally over a channel, or upon a computation that proceeds step-by-step.
Most of the time, an iterator is utilized within a
for loop, just like a true sequence is. Iterators provide a
.next() method that can be invoked explicitly, but 99 percent of the time, what you will see is along the lines of:
for x in iterator: do_something_with(x)
The loop is terminated when the behind-the-scenes call to
iterator.next() raises a
StopIteration exception. By the way, a true sequence can be turned into an iterator by calling
iter(seq) -- this will not save any memory, but it can be useful in the functions discussed below.
Python's evolving split personality
There is something schizophrenic in Python's attitudes towards functional programming (FP). On the one hand, many Python developers disparage the traditional FP functions
reduce(), usually recommending using list comprehensions in their place. But the whole of the
itertools module is composed of functions of the very same sort, merely operating on "lazy sequences" (iterators) rather than on completed sequences (lists, tuples). Furthermore, there is no syntax in Python 2.3 for "iterator comprehensions," which would seem to have to same motivation as list comprehensions.
I suspect Python will eventually grow some form of iterator comprehension, but it depends on finding a suitably natural syntax for them. In the meanwhile, we have a number of useful combinatorial functions in the
itertools module. Broadly, each of these functions accept some parameters (usually including some basis iterators) and return a new iterator. For example, the functions
izip() are directly equivalent to the respective built-ins that lack the initial
There is no
itertools, although it might seem natural; a possible Python implementation is:
Listing 1. Sample implementation of ireduce()
def ireduce(func, iterable, init=None): if init is None: iterable = iter(iterable) curr = iterable.next() else: curr = init for x in iterable: curr = func(curr, x) yield curr
The use case for
ireduce() is similar to that for
reduce(). For example, suppose you want to add a list of numbers contained in a large file, but stop when a condition is met. You could monitor the ongoing total with:
Listing 2. Adding and totaling a list of numbers
from operator import add from itertools import * nums = open('numbers') for tot in takewhile(condition, ireduce(add, imap(int, nums)): print "total =", tot
A more real-world example is probably something like applying a stream of events to a stateful object, such as to a GUI widget. But even the simple example above shows the FP flavor of iterator combinators.
Basic iterator factories
All the functions in
itertools can easily be implemented in pure Python as generators. The main point of including the module in Python 2.3+ is to provide standard behaviors and names for some useful functions. While programmers could write their own versions, each one would create a slightly incompatible variation in practice. The other point, however, is to implement iterator combinators in efficient C code. Using
itertools functions will be a bit faster than writing your own combinators. The standard documentation shows equivalent pure-Python implementations for each
itertools function, so there is no need to repeat those in this article.
The functions in
itertools are basic enough -- and distinctly named enough -- that it probably makes sense to import all the names from the module. The function
enumerate(), for example, might sensibly live in
itertools, but is instead a built-in function in Python 2.3+. Notably, you can easily express
enumerate() in terms of
from itertools import * enumerate = lambda iterable: izip(count(), iterable)
Let's look first at the few
itertools functions that do not use other iterators as a basis, but simply create iterators "from scratch."
times() returns an iterator that yields an identical object multiple times; in itself, this capability is moderately useful, but it is really nice as a substitute for the superfluous use of
xrange() and index variable to simply repeat an action. That is, rather than:
for i in xrange(1000): do_something()
You can now use the more neutral:
for _ in times(1000): do_something()
If no second argument is given to
times(), it simply yields
None repeatedly. The function
repeat() is similar to
times(), but unboundedly returns the same object. This iterator is useful either where a loop has an independent
break condition, or within combinators like
count() is a bit like a cross between
count() returns consecutive integers unboundedly (starting at an optional argument). However, given that
count() does not currently support overflow to longs correctly now, you might as well use
xrange(n,sys.maxint) still; it's not literally unbounded, but for most purposes it amounts to the same thing. Like
count() is particularly useful inside other iterator combinators.
A few of the real combinatorial functions in
itertools have been mentioned in passing.
imap() act just as you would expect from their corresponding sequence functions.
ifilterfalse() is a convenience so that you do not need to negate a predicate function in a
lambda or a
def (and it saves significant function call overhead). But functionally, you could define
ifilterfalse() (approximately, ignoring the
None predicate) as:
def ifilterfalse(predicate, iterable): return ifilter(lambda predicate: not predicate, iterable)
takewhile() partition an iterator by a predicate. The former ignores yielded elements until a predicate is fulfilled, the latter yields while a predicate holds.
dropwhile() skips an indefinite number of initial elements of an iterator, so it might not begin iterating until after a delay.
takewhile() starts right away, but terminates the iterator if the passed-in predicate become true.
islice() is basically just the iterator version of list slicing. You can specify a start, stop, and step, just as with regular slices. If a start is given, a number of elements are dropped until the passed-in iterator reaches the element of interest. This is another case where I think refinement to Python is possible -- the best thing would be for iterators to simply recognize slices, just as lists do (as a synonym for what
One final function,
starmap(), is a slight variation on
imap(). If the function passed in as an argument takes multiple arguments, the iterable passed should yield tuples of the right size. Basically, this is the same as
imap() with several iterables passed in, only with the collection of iterables previously combined with
Beyond the basics
The functions included in
itertools make for a good start. If nothing else, they encourage Python programmers to become more comfortable with utilizing and combining iterators. In a general way, widespread use of iterators is clearly important to the future of Python. But past what is included, there are some others that I would recommend for future updates to the module. You can easily use these immediately -- of course, if they are later included, the names or interface details could differ.
One category that would seem to be of general use is functions that take multiple iterables as arguments, then yield individual elements from each iterable. In contrast to this,
izip() yields tuples of elements, and
imap() yields values computed from the basis elements. The two obvious arrangements, to my mind, are
weave(). The first is similar in effect to concatenation of sequences (but appropriately lazy). That is, where for plain sequences you might use, for example:
for x in ('a','b','c') + (1, 2, 3): do_something(x)
for general iterables, you could use:
for x in chain(iter1, iter2, iter3): do_something(x)
A Python implementation is:
Listing 3. Sample implementation of chain()
def chain(*iterables): for iterable in iterables: for item in iterable: yield item
With iterables, you might also combine several by interspersing them. There is no built-in syntax to do the same with sequences, but
weave() itself works fine for completed sequences also. A possible implementation is below (Magnus Lie Hetland proposed a similar function on comp.lang.python):
Listing 4. Sample implementation of weave()
def weave(*iterables): "Intersperse several iterables, until all are exhausted" iterables = map(iter, iterables) while iterables: for i, it in enumerate(iterables): try: yield it.next() except StopIteration: del iterables[i]
Let me illustrate the behavior of
weave() since it might not be immediately obvious from the implementation:
>>> for x in weave('abc', xrange(4), [10,11,12,13,14]): ... print x, ... a 0 10 b 1 11 c 2 12 13 3 14
Even after some iterators are exhausted, the remaining ones continue to yield values, until everything available is yielded at some point.
I will propose just one more possible
itertools function. This one owes a lot to functional programming ways of conceiving problems.
icompose() has a certain symmetry with the function
ireduce() suggested above. But where
ireduce() feeds a (lazy) sequence of values to a function, and yields each result,
icompose() applies a sequence of functions to the return value of each predecessor function. A likely use of
ireduce() is to feed a sequence of events to a long-lived object.
icompose() might instead repeatedly pass an object to a mutator function that returns a new object. The first is a fairly traditional OOP way of thinking about events, while the second is more of an FP way of thinking.
Here is a possible implementation of
Listing 5. Sample implementation of icompose()
def icompose(functions, initval): currval = initval for f in functions: currval = f(currval) yield currval
Iterators -- conceived as lazy sequences -- are a powerful concept that opens new styles of Python programming. There is a subtle difference, though, between thinking of an iterator as just a data source and thinking of it in a sequential style. Neither way of thinking is more true per se, but the latter opens the path towards a combinatorial shorthand for manipulating programmatic events. The combinatorial functions in
itertools (and especially some it might grow, like those I suggest) come close to a declarative style of programming. To my mind, these declarative styles are generally less error-prone, and more powerful.
- Read the previous installments of Charming Python.
- Iterators and simple generators (developerWorks, September 2001) gives a gentle introduction to generators -- introduced in Python 2.2 -- and iterators.
- David looks at ways to reduce Python to a set of declarative elements in Create declarative mini-languages (developerWorks, February 2003).
- The official documentation of itertools includes pure Python work-alikes for each of the functions, as well as general explanation.
- Much of the inspiration for itertools -- as for previously introduced list comprehensions -- comes from the thoroughly lazy language Haskell. You can see some of the many similar functions available in Haskell's standard library. If that documentation of Haskell list functions looks entirely mysterious, you might first want to read some of the introductory parts in the Haskell 98 Report.
- For an overview of functional programming in Python, check out David's "Functional programming in Python" series on developerWorks. Part 1 covers concepts of functional programming and shows how to implement functional techniques in Python. In Part 2, David demonstrates several intermediate and advanced functional programming concepts. And Part 3 illustrates additional capabilities, such as currying and other higher-order functions contained in the Xoltar Toolkit.
- Find more Linux articles in the developerWorks Linux zone.