[TIP] test results tracking, etc...

laidler at stsci.edu laidler at stsci.edu
Sun Apr 5 10:07:39 PDT 2009

Hi Doug,

I gave a lightning talk at the TIP BoF on a lightweight test
management & reporting system we've developed. It's deliberately
designed with loose coupling among the components so that any
test runner that produces a compatible report can use the database
and reporting interface. We've also got a nose plugin that generates a compatible 

It presently supports test statuses pass/fail/error/disabled/missing,
as well as informational attributes that you could use to store
the ineligibility explanations and other keywords. If disabled is
too coarse (because it would conflate skipped and missing), I'm
pretty sure it would be straightforward to make the reporter aware
of custom statuses; and that would be a nice enhancement, too.

The system is called Pandokia, and I'll have a more detailed email
out to the list and some documentation up on the web later this week.
We developed it for internal use so have a bit of work to do before
it's ready for release, but we're aiming at releasing in a month or so.

Vicki Laidler

---- Original message ----
>Date: Sun, 05 Apr 2009 00:33:05 -0400
>From: testing-in-python-bounces at lists.idyll.org (on behalf of Douglas Philips <dgou at mac.com>)
>Subject: [TIP] test results tracking, etc...  
>To: testing-in-python at lists.idyll.org
>Just watched Titus' PyCon 2009 talk (http://us.pycon.org/2009/conference/schedule/event/30/ 
>) (I wasn't in the multiprocessing talk. Best I can figure from my  
>notes I was at an Open Spaces event)
>A little bit of background how I'm coming "at" testing:
>My work involves testing hardware attached to Windows systems using  
>Python 2.4.4.
>We have extended the unittest and use our own regression running  
>wrapper around the 2.4.4 framework.
>The test runs are automated with a rack of PCs controlled by buildbot.
>To touch on some of the issues that Titus mentioned in his talk...
>Knowing that all tests have been run is definitely an issue once you  
>get to more than a few hundred.
>- We currently have almost 500.
>- While we use buildbot to orchestrate running tests, and while its  
>waterfall is nice for getting snapshots of status, we don't have a  
>good/easy way to track results over time. Day to day test work changes  
>are filtered through an automated (buildbot) gateway before being  
>released into the main racks. Logs from the gateway are committed to  
>version control. Logs from the rack in buildbot are just in the  
>Our regression test hardness always loads all the tests it can find.  
>It logs on where they loaded from and how many there were. If that  
>number ever goes down, something's wrong. (That is just part of what  
>it logs, and these are the same logs I was referring to in the  
>previous paragraph.)
>Because we load all the tests all the time, we have made some  
>enhancement's to unittest and unittest.TestCase:
>- We're testing hardware, so when the regression framework starts up,  
>it does some inquiries to see what device it is testing, and sets  
>feature flags based on the results.
>- We also have separate device configuration files which are processed  
>and they also define features that should be present in the device  
>under test.
>- Currently the regression tests are run from the device configuration  
>directory, and so the framework is able to do some simple checks to  
>make sure that the device being tested corresponds to the  
>configuration being used.
>- The tests themselves declare, in various ways:
>	which features they require
>	which features they are incompatible with
>	approximately how long they should take to run
>	which functional areas (via keywords) that they test (1)
>To support this, one of the changes to unittest was to add an  
>eligibility function that is called before the test is run. If a test  
>declares that is requires some feature, and that feature is not  
>present on the device for this test run, the test is marked as  
>ineligible including an explanation as to why. In the regression log,  
>the test is flagged as ineligible (not pass, not fail, ineligible),  
>counted in the "ineligible tests count", and the explanation is  
>We also have used the notion of skipping tests for when the above  
>mechanisms are not sufficient, so a test may be eligible for a  
>particular device, but when run, some sub-feature of the device  
>indicates that the test must be skipped. This is a bit clunky since  
>skip and eligible were categories we thought would have more  
>distinction, but in practice there hasn't been much of a useful  
>Our biggest itch to scratch now is processing, posting, aggregating,  
>etc. test results. We need, to use the agile term, an information  
>I don't know if raw TAP will work since we have more than just PASS/ 
>FAIL for our results. We have PASS, FAIL, SKIP, INELIGIBLE and  
>INCONCLUSIVE. Inconclusive means that while the test didn't fail, it  
>also did not run long enough to produce a statistically valid result.  
>As I mentioned above, SKIP and INELIGIBLE we're probably going to  
>collapse into one state, so that would leave us with four possible  
>test results.
>I'm not sure if this is of interest on this list, since a lot of what  
>I saw and heard at PyCon (and on this list since I've joined) was  
>either plain unit-testing or "functional" testing, which seemed to be  
>a code-word for GUI/Web testing.
>I am hoping that a common test reporting system would still be  
>something we could use and/or contribute to, but regardless we are  
>going to have pursue a solution. I'd personally prefer to not have to  
>re-invent the wheel. :)
>(1) - keywords are to support internal customers who are debugging new  
>devices and may only want to run certain subsets of the tests
>testing-in-python mailing list
>testing-in-python at lists.idyll.org

More information about the testing-in-python mailing list