[TIP] Test isolation // Detection of offending test

Thu Dec 6 04:51:12 PST 2012

On Thu, Dec 6, 2012 at 12:55 AM, Robert Collins
<robertc at robertcollins.net> wrote:
> On Thu, Dec 6, 2012 at 9:55 AM, Benji York <benji at benjiyork.com> wrote:
>> On Wed, Dec 5, 2012 at 4:50 PM, Andres Riancho <andres.riancho at gmail.com> wrote:
>>> Lists,
>>>
>>>     I've got a project with 500+ tests, and during the last month or
>>> so I started to notice that some of my tests run perfectly if I run
>>> them directly (nosetests --config=nose.cfg
>>> core/data/url/tests/test_xurllib.py) but fail when I run them together
>>> with all the other tests (nosetests --config=nose.cfg core/).
>> [snip]
>>>     I suspect that this is a common issue when testing, how do you
>>> guys solve this?
>>
>> The way I have handled it is to use a test runner that can randomize
>> test order and use that option with a continuous integration system like
>> Buildbot or Jenkins.  It is especially nice if the test runner tells you
>> the seed it used for the random number generator so you can replicate
>> the test order yourself.
>
> Oh hai :).

:)

> You can also (for a specific case) bisect, if you have a consistent
> run order, by running the last 1/2 leading to the failure. Then the
> last 1/4 etc until it fails, and when it fails, remove the last 1/N-1
> tests (keeping the one that breaks), and so on and so forth.

That is a good point.  Once you do the above you will have a set of
tests that when run together all pass, but when the last is run by
itself it fails.  That's when I usually switch from bisecting into a
slightly different mode:

Lets call these tests A-Z.  It is almost certainly the case that one (or
more) of the A-Y tests mutates global state in a way that Z depends on
(intentionally or unintentionally).  In that case I first try the
obvious thing and run just A and Z together.  If that works, then you
win.  If not I start throwing away large chunks of A-Y until I find just
the set of tests that make Z pass.  Throwing away tests that seem
unrelated to the one causing the problem often yields good results,
while tests of related code/data tend to be the culprits.

After you pare it down to the smallest set that still passes you then
are left with inspection to figure out why Z depends on the ones that
are left.

At that point petting a cat usually helps me figure it out.  YPMV
-- 
Benji York