Charming Python: Using combinatorial functions in the itertools module

Functional programming in Python becomes lazy

Python 2.2 introduced simple generators to the Python language and reconceived standard loops in terms of underlying iterators. With Python 2.3, generators become standard (no need for _future_), and the new module itertools is introduced to work flexibly with iterators. The itertools module is essentially a set of combinatorial higher-order functions, but ones that work with lazy iterators rather than with finite lists. In this installment, David explores the new module, and gives you a sense of the new expressive power available with combinatorial iterators.

Share:

David Mertz, Developer, Gnosis Software, Inc.

While David Mertz also likes hubris and impatience, this installment is about laziness. David may be reached at mertz@gnosis.cx; his life pored over at his personal Web page. David's book Text Processing in Python was recently published by Addison-Wesley; check it out. Suggestions and recommendations on this, past, or future columns are welcome.



12 June 2003

Also available in Russian Japanese

Understanding a new concept

The idea of iterators was introduced into Python with version 2.2. Well, that is not quite true; hints of the idea were already present in the older function xrange(), and the file method .xreadlines(). Python 2.2 generalized the notion in much of its internal implementation and made programming custom iterators much more straightforward by introducing the yield keyword (the presence of yield turns a function into a generator, which in turn returns an iterator).

The motivation behind iterators is twofold. Working with data as sequences is often the most straightforward approach, and a sequence that is processed in linear order often does not need to actually exist all at once.

The x*() premonitions provide clear examples of these principles. If you want to do something a billion times, your program will probably take a while to execute, but there is no general need for it to claim a lot of memory. Likewise, for many types of files, processing can be a line-by-line matter, and there is no need to store the whole file in memory. All sorts of other sequences can best be treated lazily too; they might depend on data arriving incrementally over a channel, or upon a computation that proceeds step-by-step.

Most of the time, an iterator is utilized within a for loop, just like a true sequence is. Iterators provide a .next() method that can be invoked explicitly, but 99 percent of the time, what you will see is along the lines of:

for x in iterator:
    do_something_with(x)

The loop is terminated when the behind-the-scenes call to iterator.next() raises a StopIteration exception. By the way, a true sequence can be turned into an iterator by calling iter(seq) -- this will not save any memory, but it can be useful in the functions discussed below.


Python's evolving split personality

There is something schizophrenic in Python's attitudes towards functional programming (FP). On the one hand, many Python developers disparage the traditional FP functions map(), filter(), and reduce(), usually recommending using list comprehensions in their place. But the whole of the itertools module is composed of functions of the very same sort, merely operating on "lazy sequences" (iterators) rather than on completed sequences (lists, tuples). Furthermore, there is no syntax in Python 2.3 for "iterator comprehensions," which would seem to have to same motivation as list comprehensions.

I suspect Python will eventually grow some form of iterator comprehension, but it depends on finding a suitably natural syntax for them. In the meanwhile, we have a number of useful combinatorial functions in the itertools module. Broadly, each of these functions accept some parameters (usually including some basis iterators) and return a new iterator. For example, the functions ifilter(), imap(), and izip() are directly equivalent to the respective built-ins that lack the initial i.


Missing equivalents

There is no ireduce() in itertools, although it might seem natural; a possible Python implementation is:

Listing 1. Sample implementation of ireduce()
def ireduce(func, iterable, init=None):
    if init is None:
        iterable = iter(iterable)
        curr = iterable.next()
    else:
        curr = init
    for x in iterable:
        curr = func(curr, x)
        yield curr

The use case for ireduce() is similar to that for reduce(). For example, suppose you want to add a list of numbers contained in a large file, but stop when a condition is met. You could monitor the ongoing total with:

Listing 2. Adding and totaling a list of numbers
from operator import add
from itertools import *
nums = open('numbers')
for tot in takewhile(condition, ireduce(add, imap(int, nums)):
    print "total =", tot

A more real-world example is probably something like applying a stream of events to a stateful object, such as to a GUI widget. But even the simple example above shows the FP flavor of iterator combinators.


Basic iterator factories

All the functions in itertools can easily be implemented in pure Python as generators. The main point of including the module in Python 2.3+ is to provide standard behaviors and names for some useful functions. While programmers could write their own versions, each one would create a slightly incompatible variation in practice. The other point, however, is to implement iterator combinators in efficient C code. Using itertools functions will be a bit faster than writing your own combinators. The standard documentation shows equivalent pure-Python implementations for each itertools function, so there is no need to repeat those in this article.

The functions in itertools are basic enough -- and distinctly named enough -- that it probably makes sense to import all the names from the module. The function enumerate(), for example, might sensibly live in itertools, but is instead a built-in function in Python 2.3+. Notably, you can easily express enumerate() in terms of itertools functions:

from itertools import *
enumerate = lambda iterable: izip(count(), iterable)

Let's look first at the few itertools functions that do not use other iterators as a basis, but simply create iterators "from scratch." times() returns an iterator that yields an identical object multiple times; in itself, this capability is moderately useful, but it is really nice as a substitute for the superfluous use of xrange() and index variable to simply repeat an action. That is, rather than:

for i in xrange(1000):
    do_something()

You can now use the more neutral:

for _ in times(1000):
    do_something()

If no second argument is given to times(), it simply yields None repeatedly. The function repeat() is similar to times(), but unboundedly returns the same object. This iterator is useful either where a loop has an independent break condition, or within combinators like izip() and imap().

The function count() is a bit like a cross between repeat() and xrange(). count() returns consecutive integers unboundedly (starting at an optional argument). However, given that count() does not currently support overflow to longs correctly now, you might as well use xrange(n,sys.maxint) still; it's not literally unbounded, but for most purposes it amounts to the same thing. Like repeat(), count() is particularly useful inside other iterator combinators.


Combinatorial functions

A few of the real combinatorial functions in itertools have been mentioned in passing. ifilter(), izip(), and imap() act just as you would expect from their corresponding sequence functions. ifilterfalse() is a convenience so that you do not need to negate a predicate function in a lambda or a def (and it saves significant function call overhead). But functionally, you could define ifilterfalse() (approximately, ignoring the None predicate) as:

def ifilterfalse(predicate, iterable):
    return ifilter(lambda predicate: not predicate, iterable)

The functions dropwhile() and takewhile() partition an iterator by a predicate. The former ignores yielded elements until a predicate is fulfilled, the latter yields while a predicate holds. dropwhile() skips an indefinite number of initial elements of an iterator, so it might not begin iterating until after a delay. takewhile() starts right away, but terminates the iterator if the passed-in predicate become true.

The function islice() is basically just the iterator version of list slicing. You can specify a start, stop, and step, just as with regular slices. If a start is given, a number of elements are dropped until the passed-in iterator reaches the element of interest. This is another case where I think refinement to Python is possible -- the best thing would be for iterators to simply recognize slices, just as lists do (as a synonym for what islice() does).

One final function, starmap(), is a slight variation on imap(). If the function passed in as an argument takes multiple arguments, the iterable passed should yield tuples of the right size. Basically, this is the same as imap() with several iterables passed in, only with the collection of iterables previously combined with izip().


Beyond the basics

The functions included in itertools make for a good start. If nothing else, they encourage Python programmers to become more comfortable with utilizing and combining iterators. In a general way, widespread use of iterators is clearly important to the future of Python. But past what is included, there are some others that I would recommend for future updates to the module. You can easily use these immediately -- of course, if they are later included, the names or interface details could differ.

One category that would seem to be of general use is functions that take multiple iterables as arguments, then yield individual elements from each iterable. In contrast to this, izip() yields tuples of elements, and imap() yields values computed from the basis elements. The two obvious arrangements, to my mind, are chain() and weave(). The first is similar in effect to concatenation of sequences (but appropriately lazy). That is, where for plain sequences you might use, for example:

for x in ('a','b','c') + (1, 2, 3):
    do_something(x)

for general iterables, you could use:

for x in chain(iter1, iter2, iter3):
    do_something(x)

A Python implementation is:

Listing 3. Sample implementation of chain()
def chain(*iterables):
    for iterable in iterables:
        for item in iterable:
            yield item

With iterables, you might also combine several by interspersing them. There is no built-in syntax to do the same with sequences, but weave() itself works fine for completed sequences also. A possible implementation is below (Magnus Lie Hetland proposed a similar function on comp.lang.python):

Listing 4. Sample implementation of weave()
def weave(*iterables):
    "Intersperse several iterables, until all are exhausted"
    iterables = map(iter, iterables)
    while iterables:
        for i, it in enumerate(iterables):
            try:
                yield it.next()
            except StopIteration:
                del iterables[i]

Let me illustrate the behavior of weave() since it might not be immediately obvious from the implementation:

>>> for x in weave('abc', xrange(4), [10,11,12,13,14]):
...    print x,
...
a 0 10 b 1 11 c 2 12 13 3 14

Even after some iterators are exhausted, the remaining ones continue to yield values, until everything available is yielded at some point.

I will propose just one more possible itertools function. This one owes a lot to functional programming ways of conceiving problems. icompose() has a certain symmetry with the function ireduce() suggested above. But where ireduce() feeds a (lazy) sequence of values to a function, and yields each result, icompose() applies a sequence of functions to the return value of each predecessor function. A likely use of ireduce() is to feed a sequence of events to a long-lived object. icompose() might instead repeatedly pass an object to a mutator function that returns a new object. The first is a fairly traditional OOP way of thinking about events, while the second is more of an FP way of thinking.

Here is a possible implementation of icompose():

Listing 5. Sample implementation of icompose()
def icompose(functions, initval):
    currval = initval
    for f in functions:
        currval = f(currval)
        yield currval

Conclusion

Iterators -- conceived as lazy sequences -- are a powerful concept that opens new styles of Python programming. There is a subtle difference, though, between thinking of an iterator as just a data source and thinking of it in a sequential style. Neither way of thinking is more true per se, but the latter opens the path towards a combinatorial shorthand for manipulating programmatic events. The combinatorial functions in itertools (and especially some it might grow, like those I suggest) come close to a declarative style of programming. To my mind, these declarative styles are generally less error-prone, and more powerful.

Resources

  • Read the previous installments of Charming Python.
  • Iterators and simple generators (developerWorks, September 2001) gives a gentle introduction to generators -- introduced in Python 2.2 -- and iterators.
  • David looks at ways to reduce Python to a set of declarative elements in Create declarative mini-languages (developerWorks, February 2003).
  • The official documentation of itertools includes pure Python work-alikes for each of the functions, as well as general explanation.
  • Much of the inspiration for itertools -- as for previously introduced list comprehensions -- comes from the thoroughly lazy language Haskell. You can see some of the many similar functions available in Haskell's standard library. If that documentation of Haskell list functions looks entirely mysterious, you might first want to read some of the introductory parts in the Haskell 98 Report.
  • For an overview of functional programming in Python, check out David's "Functional programming in Python" series on developerWorks. Part 1 covers concepts of functional programming and shows how to implement functional techniques in Python. In Part 2, David demonstrates several intermediate and advanced functional programming concepts. And Part 3 illustrates additional capabilities, such as currying and other higher-order functions contained in the Xoltar Toolkit.
  • Find more Linux articles in the developerWorks Linux zone.

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into Linux on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Linux
ArticleID=11320
ArticleTitle=Charming Python: Using combinatorial functions in the itertools module
publish-date=06122003