[TIP] Pythoscope proposal
constant.beta at gmail.com
Mon Aug 18 20:43:19 PDT 2008
What you'll find below is a proposal for a project we (signed below)
were thinking about for some time now and finally find some time to
work on. We feel test generation is a great idea, and probably an only
viable way for legacy system maintainers to remain sane. So, we
present you a Python "code stethoscope", a tool for all codebases
doctors out there. We've already setup a homepage:
and a Launchpad project:
Let us know what you think by replying on TIP. You can also put your
comments on the wiki if you want. Enjoy. :-)
Signed off by:
Our mission statement
To create an easily customizable and extensible open source tool that
will automatically, or semi-automatically, generate unit tests for
legacy systems written in Python.
Pythoscope. Your way out of The (Lack of) Testing Death Spiral.
Milestones listed below are there to give you a general idea about
where we stand and where we want to go. Having said that, we plan
working on the system the agile way, with requirements fleshing out
(and, undoubtedly, numerous problems popping out) as we go. We
definitely want to keep our goals realistic and quickly want to arrive
to the point where our work will be helpful to at least part of real
projects out there. We hope to work closely with the Python testing
community in order to keep the project on a right track.
Rather tentative schedule for milestones follows. We want to complete
milestone 2 pretty quickly, to start working with code as soon as
possible. Our plan is to complete milestone 6 in about a month and
start working on, what now looks like the hardest problem, side
Milestone 1 (The proposal): done
Milestone 2 (Architecture): August 20th
Milestone 3 (Static analysis): August 31st
Milestone 4 (Dynamic analysis): September 7th
Milestone 5 (Setup & teardown): September 14th
Milestone 6 (Side effects): September 21st
Milestone 1: Write a proposal and introduce the project to the Python community.
At the time of this writing, this milestone has just been completed. :-)
Milestone 2: Decide on an initial architecture.
In terms of architecture I basically see it divided into two parts.
First part's responsibility is to collect and analyze information
about the legacy code and store it on disk. After that the second
component jumps in and uses this information to generate unit tests.
This separation is nice in many ways. First of all, it clearly
isolates responsibilities. Second, it allows us to rerun the parts
independently. So, whether we want the tool to gather new information
from recently changed source code, or start from scratch with unit
test stubs for some old class, we can do it without touching the other
part of the system.
This separation should be mostly understood at the conceptual level.
Both parts will surely share some of the library code and they may
even end up being invoked with the same script, using appropriate
command line flag. The distinction is important, because we may end up
using the relevant information for other things than unit test
generation. Like, for example, powerful source code browser, debugger,
or a refactoring tool. This is possible, but not certain, future. For
now we'll focus our attention on the test generation, because we feel
this is an area of Python toolbox that needs improvement most.
The information collector will accept directories, files and points of
entry (see dynamic code analysis description in milestone 4) to
produce a comprehensive catalog of information about the legacy
system. This includes things like names of modules, classes, methods
and functions, types of values passed and returned during execution,
exceptions raised, side effects invoked and more, depending on the
needs of the test generator. This is the part of the system that will
require most of the work. Dynamic nature of Python, while it gives us
a lot of leverage and freedom, introduces specific challenges related
to code analysis. This will be a fun project, that I'm sure of. :-)
Milestone 3: Explore the test generation idea, ignoring the problem of
side effects for now, using static analysis only.
In this milestone we'll put life into the project. We'll focus on
implementing foundations of the architecture. Analyzer will only look
at the code statically, with just enough code to support test stubs
generation. Generator will generate test stubs to the designated
output directory for the specified modules, classes, methods and
stand-alone functions. The main way to interact with the scripts will
be to pass arguments to them, although we should keep the door open
for later addition of configuration files. I feel those aren't
necessary at such an early stage. Once we get a better idea of what
should be configurable and to what extend, we'll think about
There are quite a few tools that explored Python source code analysis.
- source code style checkers, like pylint, PyChecker, and PyFlakes
- CodeInvestigator debugger
- Python4Ply, lexer and parser for Python
- cyclomatic complexity analyzer
- type annotators, like PyPy type annotator and annotate script
- Cheesecake codeparser.py
So we have a plenty of examples of using the stdlib's (now deprecated)
compiler module. Any thoughts whether we should use the new _ast
module appreciated. My guess is that we shouldn't, because we
don't want to force people into using the latest stable version of
Python. They may not have a choice while working on their legacy
Milestone 4: Dynamic code analysis, generation of basic test cases.
In this milestone we'll enter the realm of dynamic code tracking and
analysis. Using points of entry provided by the user we'll execute
real code and gather information about values passed and returned
(both normally and through exceptions). From that we'll generate
actual characterization test cases, with all the values gathered
during the run. Initially we'll stick with basic Python types, and
maybe simple derivatives, to handle complicated object creation in the
We have a smaller, but strong set of examples of code tracing. Those
include coverage tools like coverage and figleaf, as well as
type annotators mentioned earlier. I also developed a proof-of-concept
tool called ifrit, available here:
Milestone 5: Code with setup (still no side effects).
During the dynamic run we'll get actual values, not necessarily with
any idea how to build them. We'll need a good quasi-serialization
mechanism in place that will basically take a live object and change
it to code that creates it. In this milestone we'll focus on creating
that mechanism in order to generate test cases with proper setup and
teardown. Generated code will include mocks, as depending on what
we're testing we may not need to create real objects. We'll probably
use one of the existing mocking solutions, since there are so many of
Milestone 6: Differentiate between pure and "side effecty" code.
As we slowly move into the domain of code with side effects first
thing we'll need is a method to differentiate "pure" code from a "side
effecty" one. For all practical purposes we'll treat any code that
doesn't effect the outside world in any way as pure. Whether it
assigns values or destructively modifies its local variables doesn't
matter to us, as long as it keeps those operations encapsulated.
Sample sources of side effects we're interested in are global/class
variables, file system, databases, and IO operations. Finding whether
a module is import-safe may also be a possible problem we can spend a
Future: Tackle the different kinds of side effects.
We'll try to handle each type of different side effects one by one.
This will probably include coding around some common Python libraries
that deal with those side effects. Ideally we'll want to come up with
a simple interface to describe external state and side effects related
to it in a way that you'll be able configure the system to your
specific project's needs. Practically every legacy system out there
has its own custom database/network/you-name-it library. We would like
to make the process of customizing the tool to specific projects as
painless as possible.
Future: Quickcheck-style fuzzy testing, deriving code contracts.
Another idea that is worth exploring is taking information about
values passed around and deriving Eiffel-style contracts for
methods and functions. It would work like this:
1. Generate a random input of some chosen type. We could use some
function contract information gathered earlier, but if that's not
available we can continue anyway. Not only values of arguments should
vary, but their number as well (important for testing functions with
2. Call the function with generated input.
3. Record the result.
4. Generate a test case based on this.
Test cases don't have to be generated immediately, I'd rather see them
grouped by the result (into equivalence classes) and put into
separate test cases.
Using this method we'll be able to come up with new test cases without
any user interaction, and possibly beyond normal system usage,
capturing "accidental" system behavior, which I'm guessing could be a
real time saver for legacy systems.
Design for testability obstacle
One of the greatest conceptual problems seems to be the fact that to
make testing possible, be it hand-written or generated, system has to
be designed for testability. Micheal Feathers' book "Working
Effectively with Legacy Code" is all about that topic. It suggest a
number of code refactorings that one can use to get given code under a
test harness. Book uses C++, Java and C# as examples, and fortunately
lot of the problems listed don't apply to Python. Dynamic nature of
Python, duck-typing, lack of real private variables and other
so-called "unsafe" features give us a lot of leverage here. We'll be
dealing with this problem starting from milestone 5 ("code with
setup"), so we should know pretty early if my worries are justified.
Any thoughts on the issue would be most appreciated.
 See The Death Spiral blog post:
 Only for Python 2.5 and higher, http://docs.python.org/dev/library/_ast
 See http://en.wikipedia.org/wiki/Design_by_contract
 Equivalence classes: groups of inputs that should result in the
same output or that should exercise the same logic in the system. By
organizing inputs in this manner we can focus tests on boundary values
of those classes.
More information about the testing-in-python