[TIP] Pythoscope proposal

Ryan Freckleton ryan.freckleton at gmail.com
Tue Aug 19 21:46:22 PDT 2008

This is really awesome!

Two other projects you may like for reference are pester
[http://jester.sourceforge.net/] a "mutation tester" for testing
tests, and and rope [http://rope.sourceforge.net/] a refactoring
library that does static type inference, and inference based on
running unit tests. Pester probably wont be useful until much later in
the project, but it looks like the rope inferencing and code analysis
may be useful.

On Mon, Aug 18, 2008 at 9:43 PM, Michał Kwiatkowski
<constant.beta at gmail.com> wrote:
> Hi list,
> What you'll find below is a proposal for a project we (signed below)
> were thinking about for some time now and finally find some time to
> work on. We feel test generation is a great idea, and probably an only
> viable way for legacy system maintainers to remain sane. So, we
> present you a Python "code stethoscope", a tool for all codebases
> doctors out there. We've already setup a homepage:
> http://pythoscope.org
> and a Launchpad project:
> https://launchpad.net/pythoscope
> Let us know what you think by replying on TIP. You can also put your
> comments on the wiki if you want. Enjoy. :-)
> Signed off by:
>  Titus Brown
>  Grig Gheorghiu
>  Paul Hildebrandt
>  Michal Kwiatkowski
> =====================
> Our mission statement
> =====================
> To create an easily customizable and extensible open source tool that
> will automatically, or semi-automatically, generate unit tests for
> legacy systems written in Python.
> ==========
> Slogan ;-)
> ==========
> Pythoscope. Your way out of The (Lack of) Testing Death Spiral[1].
> ==========
> Milestones
> ==========
> Milestones listed below are there to give you a general idea about
> where we stand and where we want to go. Having said that, we plan
> working on the system the agile way, with requirements fleshing out
> (and, undoubtedly, numerous problems popping out) as we go. We
> definitely want to keep our goals realistic and quickly want to arrive
> to the point where our work will be helpful to at least part of real
> projects out there. We hope to work closely with the Python testing
> community in order to keep the project on a right track.
> Rather tentative schedule for milestones follows. We want to complete
> milestone 2 pretty quickly, to start working with code as soon as
> possible. Our plan is to complete milestone 6 in about a month and
> start working on, what now looks like the hardest problem, side
> effects.
> Milestone 1 (The proposal): done
> Milestone 2 (Architecture): August 20th
> Milestone 3 (Static analysis): August 31st
> Milestone 4 (Dynamic analysis): September 7th
> Milestone 5 (Setup & teardown): September 14th
> Milestone 6 (Side effects): September 21st
> --------------------
> Milestone 1: Write a proposal and introduce the project to the Python community.
> -----
> At the time of this writing, this milestone has just been completed. :-)
> --------------------
> Milestone 2: Decide on an initial architecture.
> -----
> In terms of architecture I basically see it divided into two parts.
> First part's responsibility is to collect and analyze information
> about the legacy code and store it on disk. After that the second
> component jumps in and uses this information to generate unit tests.
> This separation is nice in many ways. First of all, it clearly
> isolates responsibilities. Second, it allows us to rerun the parts
> independently. So, whether we want the tool to gather new information
> from recently changed source code, or start from scratch with unit
> test stubs for some old class, we can do it without touching the other
> part of the system.
> This separation should be mostly understood at the conceptual level.
> Both parts will surely share some of the library code and they may
> even end up being invoked with the same script, using appropriate
> command line flag. The distinction is important, because we may end up
> using the relevant information for other things than unit test
> generation. Like, for example, powerful source code browser, debugger,
> or a refactoring tool. This is possible, but not certain, future. For
> now we'll focus our attention on the test generation, because we feel
> this is an area of Python toolbox that needs improvement most.
> The information collector will accept directories, files and points of
> entry (see dynamic code analysis description in milestone 4) to
> produce a comprehensive catalog of information about the legacy
> system. This includes things like names of modules, classes, methods
> and functions, types of values passed and returned during execution,
> exceptions raised, side effects invoked and more, depending on the
> needs of the test generator. This is the part of the system that will
> require most of the work. Dynamic nature of Python, while it gives us
> a lot of leverage and freedom, introduces specific challenges related
> to code analysis. This will be a fun project, that I'm sure of. :-)
> --------------------
> Milestone 3: Explore the test generation idea, ignoring the problem of
> side effects for now, using static analysis only.
> -----
> In this milestone we'll put life into the project. We'll focus on
> implementing foundations of the architecture. Analyzer will only look
> at the code statically, with just enough code to support test stubs
> generation. Generator will generate test stubs to the designated
> output directory for the specified modules, classes, methods and
> stand-alone functions. The main way to interact with the scripts will
> be to pass arguments to them, although we should keep the door open
> for later addition of configuration files. I feel those aren't
> necessary at such an early stage. Once we get a better idea of what
> should be configurable and to what extend, we'll think about
> configuration files.
> There are quite a few tools that explored Python source code analysis.
> Those include:
>  - source code style checkers, like pylint[2], PyChecker[3], and PyFlakes[4]
>  - CodeInvestigator[5] debugger
>  - Python4Ply[6], lexer and parser for Python
>  - cyclomatic complexity analyzer[7]
>  - type annotators, like PyPy type annotator[8] and annotate script[9]
>  - Cheesecake codeparser.py[10]
> So we have a plenty of examples of using the stdlib's (now deprecated)
> compiler module[11]. Any thoughts whether we should use the new _ast
> module[12] appreciated. My guess is that we shouldn't, because we
> don't want to force people into using the latest stable version of
> Python. They may not have a choice while working on their legacy
> applications.
> --------------------
> Milestone 4: Dynamic code analysis, generation of basic test cases.
> -----
> In this milestone we'll enter the realm of dynamic code tracking and
> analysis. Using points of entry provided by the user we'll execute
> real code and gather information about values passed and returned
> (both normally and through exceptions). From that we'll generate
> actual characterization test cases, with all the values gathered
> during the run. Initially we'll stick with basic Python types, and
> maybe simple derivatives, to handle complicated object creation in the
> next milestone.
> We have a smaller, but strong set of examples of code tracing. Those
> include coverage tools like coverage[13] and figleaf[14], as well as
> type annotators mentioned earlier. I also developed a proof-of-concept
> tool called ifrit, available here:
> http://pythoscope.org/local--files/download/ifrit-r28.tar.gz .
> --------------------
> Milestone 5: Code with setup (still no side effects).
> -----
> During the dynamic run we'll get actual values, not necessarily with
> any idea how to build them. We'll need a good quasi-serialization
> mechanism in place that will basically take a live object and change
> it to code that creates it. In this milestone we'll focus on creating
> that mechanism in order to generate test cases with proper setup and
> teardown. Generated code will include mocks, as depending on what
> we're testing we may not need to create real objects. We'll probably
> use one of the existing mocking solutions, since there are so many of
> them. ;-)
> --------------------
> Milestone 6: Differentiate between pure and "side effecty" code.
> -----
> As we slowly move into the domain of code with side effects first
> thing we'll need is a method to differentiate "pure" code from a "side
> effecty" one. For all practical purposes we'll treat any code that
> doesn't effect the outside world in any way as pure. Whether it
> assigns values or destructively modifies its local variables doesn't
> matter to us, as long as it keeps those operations encapsulated.
> Sample sources of side effects we're interested in are global/class
> variables, file system, databases, and IO operations. Finding whether
> a module is import-safe may also be a possible problem we can spend a
> time on.
> --------------------
> Future: Tackle the different kinds of side effects.
> -----
> We'll try to handle each type of different side effects one by one.
> This will probably include coding around some common Python libraries
> that deal with those side effects. Ideally we'll want to come up with
> a simple interface to describe external state and side effects related
> to it in a way that you'll be able configure the system to your
> specific project's needs. Practically every legacy system out there
> has its own custom database/network/you-name-it library. We would like
> to make the process of customizing the tool to specific projects as
> painless as possible.
> --------------------
> Future: Quickcheck-style fuzzy testing, deriving code contracts.
> -----
> Another idea that is worth exploring is taking information about
> values passed around and deriving Eiffel-style[15] contracts for
> methods and functions. It would work like this:
>  1. Generate a random input of some chosen type. We could use some
> function contract information gathered earlier, but if that's not
> available we can continue anyway. Not only values of arguments should
> vary, but their number as well (important for testing functions with
> optional arguments).
>  2. Call the function with generated input.
>  3. Record the result.
>  4. Generate a test case based on this.
> Test cases don't have to be generated immediately, I'd rather see them
> grouped by the result (into equivalence classes[16]) and put into
> separate test cases.
> Using this method we'll be able to come up with new test cases without
> any user interaction, and possibly beyond normal system usage,
> capturing "accidental" system behavior, which I'm guessing could be a
> real time saver for legacy systems.
> ===============================
> Design for testability obstacle
> ===============================
> One of the greatest conceptual problems seems to be the fact that to
> make testing possible, be it hand-written or generated, system has to
> be designed for testability. Micheal Feathers' book "Working
> Effectively with Legacy Code" is all about that topic. It suggest a
> number of code refactorings that one can use to get given code under a
> test harness. Book uses C++, Java and C# as examples, and fortunately
> lot of the problems listed don't apply to Python. Dynamic nature of
> Python, duck-typing, lack of real private variables and other
> so-called "unsafe" features give us a lot of leverage here. We'll be
> dealing with this problem starting from milestone 5 ("code with
> setup"), so we should know pretty early if my worries are justified.
> Any thoughts on the issue would be most appreciated.
> ==========
> References
> ==========
> [1] See The Death Spiral blog post:
> http://ivory.idyll.org/blog/mar-08/software-quality-death-spiral.html
> [2] http://www.logilab.org/857
> [3] http://pychecker.sourceforge.net/
> [4] http://divmod.org/trac/wiki/DivmodPyflakes
> [5] http://codeinvestigator.googlepages.com/main
> [6] http://www.dalkescientific.com/Python/python4ply.html
> [7] http://www.traceback.org/2008/03/31/measuring-cyclomatic-complexity-of-python-code/
> [8] http://codespeak.net/pypy/dist/pypy/doc/translation.html#annotator
> [9] http://www.partiallydisassembled.net/blog/?item=166
> [10] http://pycheesecake.org/browser/trunk/cheesecake/codeparser.py
> [11] http://docs.python.org/lib/module-compiler.html
> [12] Only for Python 2.5 and higher, http://docs.python.org/dev/library/_ast
> [13] http://nedbatchelder.com/code/modules/coverage.html
> [14] http://darcs.idyll.org/~t/projects/figleaf/doc/
> [15] See http://en.wikipedia.org/wiki/Design_by_contract
> [16] Equivalence classes: groups of inputs that should result in the
> same output or that should exercise the same logic in the system. By
> organizing inputs in this manner we can focus tests on boundary values
> of those classes.
> _______________________________________________
> testing-in-python mailing list
> testing-in-python at lists.idyll.org
> http://lists.idyll.org/listinfo/testing-in-python

--Ryan E. Freckleton

More information about the testing-in-python mailing list