[TIP] test results tracking, etc...
Douglas Philips
dgou at mac.com
Sat Apr 4 21:33:05 PDT 2009
Just watched Titus' PyCon 2009 talk (http://us.pycon.org/2009/conference/schedule/event/30/
) (I wasn't in the multiprocessing talk. Best I can figure from my
notes I was at an Open Spaces event)
A little bit of background how I'm coming "at" testing:
My work involves testing hardware attached to Windows systems using
Python 2.4.4.
We have extended the unittest and use our own regression running
wrapper around the 2.4.4 framework.
The test runs are automated with a rack of PCs controlled by buildbot.
To touch on some of the issues that Titus mentioned in his talk...
Knowing that all tests have been run is definitely an issue once you
get to more than a few hundred.
- We currently have almost 500.
- While we use buildbot to orchestrate running tests, and while its
waterfall is nice for getting snapshots of status, we don't have a
good/easy way to track results over time. Day to day test work changes
are filtered through an automated (buildbot) gateway before being
released into the main racks. Logs from the gateway are committed to
version control. Logs from the rack in buildbot are just in the
waterfall.
Our regression test hardness always loads all the tests it can find.
It logs on where they loaded from and how many there were. If that
number ever goes down, something's wrong. (That is just part of what
it logs, and these are the same logs I was referring to in the
previous paragraph.)
Because we load all the tests all the time, we have made some
enhancement's to unittest and unittest.TestCase:
- We're testing hardware, so when the regression framework starts up,
it does some inquiries to see what device it is testing, and sets
feature flags based on the results.
- We also have separate device configuration files which are processed
and they also define features that should be present in the device
under test.
- Currently the regression tests are run from the device configuration
directory, and so the framework is able to do some simple checks to
make sure that the device being tested corresponds to the
configuration being used.
- The tests themselves declare, in various ways:
which features they require
which features they are incompatible with
approximately how long they should take to run
which functional areas (via keywords) that they test (1)
To support this, one of the changes to unittest was to add an
eligibility function that is called before the test is run. If a test
declares that is requires some feature, and that feature is not
present on the device for this test run, the test is marked as
ineligible including an explanation as to why. In the regression log,
the test is flagged as ineligible (not pass, not fail, ineligible),
counted in the "ineligible tests count", and the explanation is
included.
We also have used the notion of skipping tests for when the above
mechanisms are not sufficient, so a test may be eligible for a
particular device, but when run, some sub-feature of the device
indicates that the test must be skipped. This is a bit clunky since
skip and eligible were categories we thought would have more
distinction, but in practice there hasn't been much of a useful
distinction.
Our biggest itch to scratch now is processing, posting, aggregating,
etc. test results. We need, to use the agile term, an information
radiator.
I don't know if raw TAP will work since we have more than just PASS/
FAIL for our results. We have PASS, FAIL, SKIP, INELIGIBLE and
INCONCLUSIVE. Inconclusive means that while the test didn't fail, it
also did not run long enough to produce a statistically valid result.
As I mentioned above, SKIP and INELIGIBLE we're probably going to
collapse into one state, so that would leave us with four possible
test results.
I'm not sure if this is of interest on this list, since a lot of what
I saw and heard at PyCon (and on this list since I've joined) was
either plain unit-testing or "functional" testing, which seemed to be
a code-word for GUI/Web testing.
I am hoping that a common test reporting system would still be
something we could use and/or contribute to, but regardless we are
going to have pursue a solution. I'd personally prefer to not have to
re-invent the wheel. :)
--Doug
(1) - keywords are to support internal customers who are debugging new
devices and may only want to run certain subsets of the tests
More information about the testing-in-python
mailing list