[TIP] Model checking Python code

Fri Sep 4 10:45:57 PDT 2009

I recently wrote up some notes on how to do model checking in Python
based on the technique used in eXplode:
http://lackingrhoticity.blogspot.com/2009/08/how-to-do-model-checking-of-python-code.html

(This is somewhat misnamed because it is not *model* checking -- it
involves testing real code, not a model of the code -- but I don't have
a better term for it yet.)

So far I have only used this technique on a small scale, to generate
test input, but I'd like to see if people here have ideas about how this
technique could be put to use in test frameworks.

Model checking has traditionally been used to explore systems with large
choice trees.  eXplode tests storage systems (Linux filesystems and
VCSes) for crash-resistance, and the choice tree of the test is
infinite.  It can take days to explore the choice tree by a useful
amount in order to discover bugs.  (However, once the path to a bug is
found it can be reproduced quickly.)

However, I think this framework is also useful for quick-running tests
with much smaller (finite, non-exponential) choice trees.

For example, it provides a way to parameterise tests to test multiple
implementations of an interface.

The obvious way to test multiple implementations is with a "for" loop:

class Test(unittest.TestCase):
    def test(self):
        for impl in [ImplementationA, ImplementationB]:
            frob(impl())

However, this has problems:
      * If the test fails, the failure message doesn't tell you which
        implementation produced the failure.
      * A failure in ImplementationA blocks failure messages about
        ImplementationB.
      * You can't run the tests for implementations A and B selectively.
      * It's not always convenient to insert a loop over
        implementations.  Their use may occur in the middle of the test
        after other objects (possibly mutable and non-reusable) have
        been created - the "for" loop would have to go at the top of the
        test function, distanced from the implementation's use.
        Furthermore, the implementation's use might occur in a nested
        function, not in the top-level test function, so it might be
        difficult to hoist a "for" loop into the top-level test
        function.

What I envisage is that parameterised tests can use the choose()
function of the model checking framework to select between
implementations to test.

e.g.

def test(chooser):
    impl = chooser.choose([("implA", ImplementationA),
                           ("implB", ImplementationB)])
    frob(impl())

Conceptually, this causes execution to fork.  The labels "implA" and
"implB" are there to allow the check runner to tell you which branches
failed if failures occur, and they can also be used to filter which
choices are taken so that tests can be run selectively.

The fact that choose() returns a single value means that a "for" loop is
not required.  A choice can easily be made in a convenience function
that is shared between tests, e.g.

def make_implementation(chooser):
    return chooser.choose([("implA", ImplementationA),
                           ("implB", ImplementationB)])

def test1(chooser):
    frob(make_implementation(chooser))

def test2(chooser):
    munge(make_implementation(chooser))

If there are multiple choices to be made, the multiplying out (to give
the Cartesian product) is done by the check runner, which is performing
a kind of inversion of control.

I'd be interested to hear if anyone has used this kind of technique in
Python before, or if anyone thinks it would help solve problems
they've encountered before.

Mark