Python testing frameworks: Selecting and running tests

The recent emergence of industrial-strength Python testing frameworks means that Python tests are being written more succinctly, more uniformly, and with better reporting of results than ever before. This article examines how the three most popular testing frameworks identify and gather tests, and what support they provide for writing entire layers of tests that share common setup and teardown code.

Brandon Craig Rhodes (brandon@rhodesmill.org), Software Engineer, Rhodes Mill Studios, Inc.

Brandon Craig Rhodes is the Editor-in-Chief of Python Magazine, and an independent web application consultant with more than a decade of experience with the Python language. He has maintained his PyEphem extension module, which provides an object-oriented interface to industrial-grade computational astronomy routines, for more than nine years, and it is used by astronomers on several continents. Brandon also coordinates the Python Atlanta user's group.



23 June 2009

Also available in Chinese

The first article in this three-part series looked at the revolution that is occurring in Python testing thanks to standard testing frameworks like zope.testing, py.test, and nose. These support more simple test idioms, and can replace the ad-hoc code that projects have traditionally had to write and maintain for running their tests. The second article examined how these automated solutions search through a Python package to identify the modules that may contain tests.

This article takes the next step and asks what the frameworks do when they then introspect a test module to discover what tests live inside of it. It also looks at details like how common test setup and teardown is supported, or not supported, by the three frameworks.

Test discovery in the Zope framework

Once a list of interesting modules has been determined, how are actual tests inside of them discovered?

Turning first to the zope.testing framework, you discover something interesting about the Zope community. Rather than build big tools that solve several problems each, they tend to build smaller and more limited tools that are capable of being connected together. The zope.testing module, as a case in point, actually provides no mechanism itself for detecting tests at all!

Instead, zope.testing leaves it to each programmer to find the tests in each module that are worth running and put them together in a list. It looks in each test module for only a single thing: a test_suite() function, which it calls, expecting to be returned an instance of the standard unittest.TestSuite class that is stuffed full of the tests that the module defines.

Some programmers using zope.testing just create and maintain this list of tests manually, in the test_suite() function. Others write custom code that takes some shortcuts for discovering what tests have been defined and are available. But the most interesting choice is to use another Zope package, z3c.testsetup, which has the same kind of capacity for automatically discovering individual tests in a package as do the other modern Python test frameworks.

Again, this is a good illustration of how Zope programmers tend to write building blocks out of which frameworks can be built rather than large monolithic solutions. The z3c.testsetup package contains neither a command-line interface with which tests can be selected, nor any output module with which test results could be displayed; it relies entirely upon zope.testing for these capabilities.

In fact, z3c.testsetup users generally do not even use zope.testing for its ability to discover test modules. Instead, they short-circuit the zope.testing algorithm by leaving unaltered its default behavior of looking only for modules named test.py, and then providing only one module with that name in their entire source tree. In the simplest case, their test.py looks something like this:

import z3c.testsetup
test_suite = z3c.testsetup.register_all_tests(my_package)

This takes the task of test discovery away from zope.testing, and instead relies upon the more powerful mechanisms provided for discovery by z3c.testsetup itself.

There are several configuration options that can be provided to the register_all_tests() function. See the z3c.testsetup documentation for details, but only its basic behavior needs to be outlined here. Unlike all of the other frameworks this article discusses, z3c.testsetup does not, by default, care about the name of each Python module in a package, but about its content. It will examine all of the modules, and all of the .txt or .rst files, in a package and select the ones that specify a :Test-Layer: somewhere in their text. It then builds the suite of tests by combining all of the TestCase classes inside the modules and all of the doctest stanzas from inside the text files.

Using :Test-Layer: strings to mark files with tests is an interesting mechanism. It does have the disadvantage that, when browsing a package's files, a new programmer has to open every one of them, or at least grep for the :Test-Layer: string, in order to find where the tests are located. (Not to mention that z3c.testsetup obviously has to do the same thing; does this make it slower than frameworks that operate only on the filename?)

Also note, finally, that the Zope test frameworks only support tests that are either UnitTest instances or doctests. As discussed in the first article in this series, the more modern Python testing frameworks also support plain Python functions as valid tests. This requires a different test detection algorithm, as you will see as you now turn your attention to these frameworks.


Test discovery in py.test and nose

The py.test and nose frameworks, as was discussed in the previous article, use similar but slightly different sets of rules to search through a Python package for the modules that they believe will contain tests. But both wind up in the same situation: with a list of modules that they must then inspect to find the functions and classes that the developer wants run as tests.

As you saw in the last article, py.test tends to select a single standard to which all projects using it are expected to conform, while nose allows far more extensive customization at the expense of predictable behavior. It is the same in this case: the rules by which tests are detected inside of a test module are fixed, invariant, and predictable for py.test, while they are flexible and customizable for nose. If a project uses nose for its testing, you will have to first visit the project's setup.cfg file before you know whether nose will be following its usual rules for detecting tests or whether it will be following different ones specific to this individual project.

Here are the procedures that py.test uses:

  • When py.test looks inside of a Python test module, it collects every function whose name starts with test_ and every class whose name starts with Test. It collects classes regardless of whether the class inherits from unittest.TestCase or not.
  • Test functions simply get run, but test classes have to be searched for methods. Any methods whose names start with test_ are run as tests once the class has been instantiated.
  • The py.test framework shows a curious behavior if provided with a test class that happens to inherit from the standard Python unittest.TestCase class: even if the class has several attractive test_ methods, py.test will die with an exception if it does not also contain a runTest() method. But if does a method does exist, then py.test ignores it; it has to exist for the class to be accepted, but will not then be run because it does not begin with test_.

    To fix this behavior, activate the framework's unittest plug-in, either in your project's conttest.py file, or by using the -p command line option:

    		            $ py.test -p unittest

    This causes py.test to make three changes to its behavior. First, instead of only detecting test classes whose names start with Test, it will also detect any other classes that inherit from unittest.TestCase. Second, py.test will no longer report an exception for TestCase subclasses that do not provide a runTest() method. And, third and finally, any setUp() and tearDown() methods on TestCase subclasses will be correctly run, in the standard fashion, before and after the tests that the class contains.

The behavior of nose, while being more customizable, somehow winds up being simpler here:

  • When nose looks inside of a Python test module, it collects functions and classes that match the same regular expression that it uses for choosing test modules. (Which, by default, looks for names that include the word Test or test, but a different regular expression can be provided on the command line or a configuration file.)
  • When nose looks inside of a test class, it runs methods matching that same regular expression.
  • Without being asked, nose will always detect subclasses of unittest.TestCase and use them as tests. It will, however, use its own regular expression to determine which of their methods are tests, rather than using the standard unittest pattern of ^test.

Generative tests

As you saw in the first article, both py.test and nose have made tests in Python vastly easier to write by supporting tests that are written as simple functions, like:

# test_new.py - simple tests functions

def testTrue(self):
    assert True == 1

def testFalse(self):
assert False == 0

Test functions, and more traditional test classes, are fine when all you want to do is check on a component's behavior in some single, specific circumstance. But what about when you want to do a long series of tests that are almost identical except for some of the parameters?

In order to make such cases easy to implement without having to cut-and-paste a dozen copies of your test function, and then changing the names to be unique, both py.test and nose support generative tests. The idea here is that you supply a test function that is actually an iterator, and that uses its yield statement(s) to return a series of functions together with the arguments with which you want them called. For example, to run a single test against each of your favorite Web browsers, you might write something like this:

# test_browser.py

def check(browser, page):
    t = TestBrowser(browser)
    t.load_page(page)
    t.check_status(200)

def test_browsers():
    for b in 'ie6', 'ie7', 'firefox', 'safari':
        for p in 'index.html', 'about.html':
        yield check, b, p

For generative tests, py.test offers one additional convenience. So that you can more easily tell the test runs apart, and thus understand the test report if one or more of them fail, the first item in each the tuple that you yield can be a name that will be printed as part of the name of the test:

# Alternate yield statement, for py.test
...
yield 'Page %s browser %s' % (b,p), check, b, p

Generative tests should provide a much more attractive solution to parametrized tests than many of the quite awkward techniques that were current in many projects using homemade tests, or restricting themselves to what unittest was capable of.


Setup and teardown

A huge issue in designing and writing a test suite is how to handle common setup and teardown code. Many real-world tests do not resemble the very simple functions that this article has been using as examples here; they have to do things like open Web page in Firefox, click on a button labelled “Continue”, and then examine the result. Before the actual test even begins (by which I mean bringing up the page and clicking on the button), the test has to first complete several expensive steps.

Now, consider one hundred functional tests that all perform a test like this. They will each need to call a common setup routine just to get Firefox running before they can commence their own particular test. Combine this with the fact that there is probably teardown code that is necessary to undo what the setup did, and you wind up with over two hundred extra function calls in your test suite. Each of its functions will look like this:

# How test functions look if they each do setup and teardown

def test_index_click_continue():
    do_big_setup()          # <- the same in every test
    t = TestBrowser(browser)
    t.load_page('index.html')
    t.click('#continue')
    t.check_status(200)
    do_big_teardown()       # <- the same in every test

To eliminate repetitious code like this, many testing frameworks provide a way to indicate once what setup and teardown code needs to run for each of entire groups of tests.

All three frameworks this article is looking at, zope.testing, py.test, and nose, support the standard setUp() and tearDown() routines of any unittest.TestCase classes that the programmer writes. Beyond this, though, the frameworks differ remarkably in the facilities they provide for common setup code.

Though zope.testing provides no extra support of its own for setup and teardown, the z3c.testsetup extension that was discussed above does something interesting with doctests. You will recall that it finds tests by looking for files with a :Test-Layer: specified somewhere in their test. The layer in a doctest can actually specify one of two different values. Marking a doctest as belonging to the unit layer means that it will be run without any special setup. But marking it as belonging to the functional layer means that it will run only after a framework setup function has been invoked.

Typically, :Test-Layer: functional tests are designed to be run when the Zope Web framework has been fully configured, so that they can create a test browser instance, send a request, and see that the Web framework returns as the response. By being willing to perform this setup on the doctest's behalf, z3c.testsetup can save large amounts of boilerplate code from having to be copied into each functional doctest.

One last convenience, which also reduces boilerplate code, is that z3c.testsetup can be given a list of variables to pre-load into the namespace of each unit doctest, and another to be pre-loaded for functional doctests. This eliminates the need to cut-and-paste a common tangle of import statements to the top of every doctest file.

Moving on to py.test, it by default provides no support for setup and teardown. It does not even run the setUp() and tearDown() methods of standard unittest.TestCase classes unless you have turned on its unittest plug-in.

It is nose that really shines when it comes to supporting common test code. When discovering tests, nose keeps up with the context in which it found them. Just like it considers every test method inside of a unittest.TestCase subclass to be “inside” that class and therefore governed by its setUp() and tearDown() methods, it also considers tests to live “inside” of their module, the enclosing package, and any package outside of that. For nose, therefore, a test lives inside of not one but a series of concentric containers, any of which can contain setup code that gets run before the test and teardown code that gets run afterward.

Read the nose documentation for more information about package-wide and module-wide setup and teardown functions; among other details, you will learn that you have a bewildering array of choices for what your setup and teardown functions can be called. (Once again, nose seems to have difficulty encouraging different projects to write tests the same way so that they can easily read each other's code.) But they are a very powerful way to make your groupings of functions into packages and modules not merely structural (they all got put here) but also semantic (the tests in here all run in the same environment).

There is one case in which nose does not care about the name of setup and teardown functions: when you specify them explicitly for a particular function using the @with_setup decorator. Again, if this interests you, consult the nose documentation; here, I will only take the space to note that, since functions are first-class objects in Python, you can assign a name to a particular decorator and use it over and over again:

# Naming a with_setup decorator

firefox_test = with_setup(firefox_setup, firefox_teardown)

@firefox_test
def test_index_click():
    ...

@firefox_test
def test_index_menu():
...

One final distinction: while the setup and teardown functions specified in a @with_setup decorator or provided as methods in a unittest.TestCase subclass get run once for each function or test that they wrap, the setup and teardown code that you give nose up at the module or package level gets run only once for the entire set of tests. Do not, therefore, expect such tests to be properly isolated from each other: they will share a single copy of any resources that you create in the module's or package's setup routine.


Conclusion

Congratulations! You now understand how the different testing frameworks will support us (or fail to support us) by detecting our tests and arranging for them to run. The last article in this series will look at the payoff for all of the work that a framework puts into collecting our tests: the powerful test-selection options, reporting tools, and debugging support with which we make test results useful to us. And, in conclusion, we will consider how to choose from among the three frameworks the one best suited to your needs.

Resources

Learn

Discuss

Comments

developerWorks: Sign in

Required fields are indicated with an asterisk (*).


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.

All information submitted is secure.

Choose your display name



The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

Required fields are indicated with an asterisk (*).

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


All information submitted is secure.

Dig deeper into AIX and Unix on developerWorks


static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=AIX and UNIX
ArticleID=398950
ArticleTitle=Python testing frameworks: Selecting and running tests
publish-date=06232009