[TIP] LNCS: Accuracy of test cases generated by pythoscope WAS: Test coverage and parsing.

Mon Oct 12 11:05:30 PDT 2009

2009/10/12 Olemis Lang <olemis at gmail.com>:
> `pythoscope` may be considered as an MB test generator that considers
> as input the (legacy) code itself (... or an equivalent representation
> like ASTs ;o) and employs that model to generate test cases. If this
> is correct (CMIIW) then users are facing the situation mentioned
> above: «tests and the implementation are not sufficiently independent»
> and «any errors in that model (i.e. legacy code) will generate
> incorrect implementation code (i.e. they are both the same thing), but
> the tests will contain the same errors, so no test failures will be
> found»

The conclusion is correct, but the premise is not. Pythoscope is not a
model-based test generator. I think it's very important to distinguish
a model from the implementation. Models represent intent, while
implementation is a mere projection of the model - it may do what was
intended or not. You could say I define "model" as a Platonic idea
that never can be fully realized.

There is of course a case of a model so precise that the code can be
generated from it, but IMHO
1) this is evil :-), and
2) creating such a model *is* programming anyway, what exposes the
model to the same danger of having implementation flaws.

>    - Is it possible that something like that be happening
>      if using `pythoscope` ?
>    - If true ... Is there any way (provided by pythoscope or not ;o)
>      to overcome this ?

Now a few words about Pythoscope.

Pythoscope is, for the most part, a regression tests generator. Its
main use case is generation of test cases for a legacy system (i.e. no
tests exist). The main problem with maintaining legacy systems are
regressions. Any change you make to the code base could break some
functionality, and without a test suite there's no way to check for
that.

Now, the solution is to use a tool like Pythoscope to generate a set
of characterization tests[1]. Those tests will capture actual behavior
of the system, not the intended behavior. In other words, there will
both be tests that check for the good behavior, and tests that check
for buggy behavior. From the point of view of a developer it doesn't
matter - she just want to add this one feature or fix this one bug.
:-) Once she has those auto-generated tests she can actually tell if a
regression occured or not. Whether a regression means a fixed bug or a
real breakage that's up to her to decide, on a case-by-case basis.
Legacy systems often lack not only tests but also documentation and
sometimes it's hard to tell if something is really a bug or a feature
that some third-party application may depend on. Caution is advised
and Pythoscope can give you the safety net.

I think now it is clear that you can't overcome the limitations of
this technique. More often than not a developer alone can't figure out
what the desired behavior is and needs help - of a documentation, a
user or a fellow developer who worked on the system a bit longer. A
small tool like Pythoscope simply doesn't have enough information to
figure those things out by itself.

Hope that answered your questions. :-)

[1] http://en.wikipedia.org/wiki/Characterization_test

Cheers,
mk