Released in October 2000, Python 2.0 introduces a number of new language features and includes some new standard modules. One of Guido van Rossum's virtues -- probably the one that best earns him the affectionate title "benevolent dictator for life (BDFL)" in the Python community -- is his conservatism in changing Python. He supports very few changes between Python versions, and what does change tends to be considered and discussed for months or years in advance. This makes for great backward and forward compatibility in Python, and for a consistency in running Python programs across platforms and versions. That said, Python 2.0 represents a pretty large jump in the language definition of Python 1.5x. Fortunately, Python 2.0 still maintains great backward compatibility, and the changes that have been made are generally very "pythonic" in character.
By the way, it is worth noting that a short-lived Python 1.6 was released in September 2000. This release is a bit of a curiosity -- its existence derives from contractual obligations by the Python core development team, who were finding a new organizational home during the same period as the 1.6/2.0 development. For the most part Python 1.6 resembles Python 2.0, but if you are installing a new version, it is better to install Python 2.0.
Check the Resources for an exhaustive summary of changes. This article contains a subjective evaluation of what I find most important and interesting; some of the changes that interest you might not be addressed here.
For me, probably the most exciting new feature of Python 2.0 is the addition of list comprehensions. (Any oddball readers with a math background may note that this capability is sometimes called "ZF-comprehension" in other functional languages, after the Axiom of Comprehension in Zermelo-Frankel set theory.)
Most readers will note an odd expression in the previous paragraph: "other functional languages." As a Python programmer you have been programming in a (mixed) functional language since Python 1.0. Of course, if you are not in the habit of using the built-in functions lambda(), map(), reduce(), and filter(), you have not been using these functional features. But even if you do use these, Python has always made it easy to avoid thinking about functional paradigms.
In any case, list comprehension is a way of doing much of what Python's functional built-ins do, but in a much more compact way that is simultaneously easier to read and understand. Let's start out with a simple example of list comprehensions in action:
Example of list comprehensions
|
In the example we created a list of tuples
where each tuple element is drawn from other lists, and where
each list element satisfies some property. Without the if
clause, we would just create a permutation (which is often
useful in itself); but with the if clause we can create a
filter() type pruning of the list. Multiple if clauses are
allowed in one list comprehension, by the way.
There is nothing fundamentally new in list comprehension capability; certainly the same effect could be achieved in Python 1.5x, but less clearly. For example, the following more verbose (and less clear) techniques can do the same thing:
Comparison of techniques in version 1.5x
|
In the example I have given, the nested procedural loops are clearer than the functional-style calls (perhaps readers will notice a better functional approach). But both are far less clear than the list comprehension style.
With some programmer practice, list comprehensions can substitute for most uses of functional-style built-ins and also for many nested loops.
One new built-in function in Python 2.0 is particularly useful in conjunction with list comprehensions. You can think of what zip() does by imagining the teeth of a zipper: two or more sequences are combined into a list of tuples (with each tuple having one element from each calling sequence). This is often useful if you do not want a list comprehension that uses a complete permutation of lists, but merely one that utilizes corresponding elements of multiple lists. For example:
The zip() function
|
Another big addition for Python 2.0 is Unicode support. If you need to use multinational character sets in your programs, this capability is absolutely essential. Of course, if, like me, you have not had any specific requirement for characters outside ASCII, the Unicode support does not really matter. Fortunately, the implementation of Unicode in Python 2.0 is extremely well designed, and does not get in the way of anything else.
Unicode strings may be represented in several ways. For escaped characters, the sequence "\uHHHH" can be used, where HHHH is a four-digit hexadecimal number. Longer strings can be entered using the new Unicode quoting syntax: u"string". This is very similar in style to the r"string" quoting style used for composing regular expressions without resolving escape codes at the Python level (because regular expressions use some of the same escape codes). Of course, to use the Unicode quoting syntax, you need to have a text editor capable of entering Unicode characters between the quotes.
Conversion between 8-bit strings and Unicode strings -- and also between different Unicode encodings -- is performed using the new codecs module.
Another nice syntax enhancement was made to function calls. It is now possible to call functions directly with a tuple of arguments and/or a dictionary of keyword arguments. As with list comprehensions, no fundamentally new capability is added, but the expression of some common chores is clearer and more concise. Methods in Python, of course, are just functions that are bound to class instances, so everything works the same for functions and methods.
Python programmers will be familiar with the previous syntax for defining extra positional and keyword arguments within a function definition. For example, we might have:
Defining extra positional and keyword arguments
|
Python 2.0 adds the same convention for function calls as is used for function definitions. For example:
Convention for function calls
|
Achieving the same effect (passing arguments via named lists,
perhaps ones created dynamically at runtime) was always
possible in Python. But the new calling syntax is more
convenient than the old use of the apply() function.
Python now has a shortcut in assignments that will be familiar to programmers of C, Perl, Awk, Java, and a variety of other languages. It is now possible to stick an operator at the beginning of an equal sign to change the assigned value of a variable based on its old value. For example:
New shortcut in assignments
|
Semantically, the augmented operators do exactly the same thing as repeating the left-side variable on the left side of a plain assignment, and following it with the corresponding operator and second operand. So in that sense, this is just syntactic sugar.
Notice, however, that the augmented assignments are actually an improvement in terms of performance. I have not benchmarked it myself, but discussion suggests that using an augmented assignment saves a lookup and some object allocation. For numbers, this is insignificant; but if you happen to be working with multi-megabyte strings, use of augmented assignment can speed things up and reduce memory usage.
Python's memory management is probably a pretty arcane issue for most day-to-day Python programmers. Traditionally, Python has used a reference-counting scheme to delete objects when they are no longer accessible from any name. However, a reference-counting methodology is theoretically prone to leaking memory if cyclic references are used in a program. For example, this code will break the reference counting:
Cyclical references in Python
|
At this point, it is impossible to access myobject, but it
will not have been deleted, since the reference count was
incremented twice, but only decremented once.
As bad as this might sound, most programmers will never
experience any actual problems due to code like the above. In
most cases, cyclical references will not be used in the first
place, and even if they are, most times the memory leakage will
be small (you can easily construct artificial cases of
dangerous behavior; for example, add a myobject.big='#'*10**6
to the above example).
In any case, Python 2.0 adds a compile-time option for mark-and-sweep garbage collection (GC). Most distributions of Python 2.0 seem to be compiled using this option; but if you need to, you can compile your own Python version that turns off the garbage collection option. In either case, reference counting is still used; it is just a question of whether leaks like the above are cleaned up.
On some platforms, like embedded systems, GC may be undesirable. Garbage collection takes some CPU cycles (not a lot, but some). Perhaps more importantly, reference counting is determinate in program behavior, while garbage collection is not. That is, you never know for sure when a garbage collection will eat a few CPU cycles; therefore, using the GC version of Python will cause the identical program to behave differently (in terms of timings) from run to run.
These issues are fascinating theoretically, but most programmers should just ignore them from here on out. Whatever Python distribution you pick up will almost certainly do the right thing for the platform you are using; unless you know enough to know exactly why you want to enable or disable GC, I recommend not worrying about it.
As good a job as van Rossum and the rest of the team have done with Python 2.0, they also introduced one wart in Python. It does something moderately useful, but in my opinion (and also in the opinion of many other Python programmers), it introduces a brand new (and ugly) syntactic feature where none is needed. Most programmers suspect this imperfection is merely a ruse, however, to make the simplicity and beauty of the rest of Python shine even more brightly.
The print statement performs a bit of magic that the
.write() method of file objects does not (and sys.stdout is
just another file object that print writes to). The print statement allows multiple arguments, each of any Python type. The
trailing comma conveniently allows line continuation between
print statements, while the default writes each bunch of
stuff to its own line. Overall, print is just a handy way to
get some information from a program to the console.
A lot of Python programmers have wanted that same print mojo
to be available for writing to other file objects (such as
sys.stderr, regular files, or "file-like" objects that various
modules provide). I think the right way to do this would be to
add a .print() method to file objects and do the magic there.
But Python 2.0 adds this capability by adding the "redirection
operator" >> to the print statement. For example:
The redirection operator in the print statement
|
This works -- and it adds a nice capability -- but it nudges Python just a hair closer to the "executable line-noise" feel of certain other programming languages.
- Read the previous installments of Charming Python.
- Check out the Python 2.0 CHANGELOG
- A.M. Kuchling and Moshe Zadka have written a good (and closer
to official) summary of changes in Python 2, called
"What's New in Python 2.0"
- A very nice ActiveState distribution of Python 2.0 bundles a number of
useful tools that will not necessarily be found by default in
other distributions
Since conceptions without intuitions are empty, and intuitions without conceptions, blind, David Mertz wants a cast sculpture of Milton for his office. Start planning for his birthday. David may be reached at mertz@gnosis.cx; his life pored over at http://gnosis.cx/dW/. Suggestions and recommendations on this, past, or future, columns are welcomed.