[TIP] Tasting the Cheeseshop

Fri Oct 14 16:41:58 PDT 2011

I've long had this idea in my head to test all the packages in the Cheeseshop.
I think it would be cool if we could overlay some sense of which versions and
implementations of Python a packages is *tested* to be compatible with, and
make available a general health check of a package's test suite.

I've had a little bit of downtime at work[1] recently, so I wrote some
experimental code.  I want to share what I have with you now[2], and get your
feedback because I think this would be an interesting project, and because I
think the basic Python facilities can be improved.

I call the project "Taster" as a play on "tester", "taste tester", and
cheese. :) The project and code is up on Launchpad[3] but it's a bit rough and
there's not a lot of documentation.  I'll work on filling out the README[4]
with more details about where I want to go, in the meantime, this email is it.

So the rough idea is this: I want to download and unpack packages from PyPI,
introspect them to see if they have a test suite, then run the test suite in a
protected environment against multiple versions of Python, and collate the
results for publishing.  I'd like to be able to run the tool against, say all
packages uploaded today, since last month, etc., or run it based on the PyPI
RSS feed so that packages would get tested as they're uploaded.

A related project is to enable testing by default of Python packages when they
are built for Debian.  Right now, you have to override the test rules for any
kind of package tests short of `make check` and most Python packages don't
have a Makefile.  I've had some discussion with the debhelper maintainer[5],
and he's amenable but skeptical that there actually *is* a Python standard for
running tests.

Taster contains a few scripts for demonstration.  One queries PyPI for a set
of packages to download, one script downloads the sdists of any number of
packages, one unpacks the tarballs/zips, and one runs some tests.  It should
be fairly obvious when you grab the branch which script does what.

I had planned on using tox (which is awesome btw :) for doing the multi-Python
tests.  I'll describe later why this is a bit problematic.  I also planned on
using Jenkins to drive and publish the results, but that's also difficult for
a couple of reasons.  Finally, I planned to use something like LXC or
arkose[6] to isolate the tests from the host system so evil packages can't do
evil things.  That also turned out to be a bit difficult, so right now I run
the tests in an schroot.

As you might imagine, I've run into a number of problems (heck, this is just
an experiment), and I'm hoping to generate some discussion about what we can
do to make some of the tasks easier.  I'm motivated to work on better Python
support if we can come to some agreement about what we can and should do.

On the bright side, querying PyPI, downloading packages, and unpacking them
seems pretty easy.  PyPI's data has nice XMLRPC and JSON interfaces, though
it's a bit sad you can't just use the JSON API for everything.  No matter,
those are the easy parts.

The first difficulty comes when you want to run a package's tests.  In my
mind, *the* blessed API for that should be:

    $ python setup.py test

and a package's setup.py (or the equivalent in setup.cfg, which I don't know
yet) would contain:

    setup(
        ...
        test_suite='foo.bar.tests',
        use_2to3=True,
        convert_2to3_doctests=[mydoctests],
        ...
        )

In fact, I have this in all my own packages now and I think it works well.

Here are some problems:

 * setuptools or distribute is required.  I don't think the standard distutils
   API supports the `test_suite` key.
 * egg-info says nothing about `test_suite` so you basically have to grep
   setup.py to see if it supports it.  Blech.
 * `python setup.py test` doesn't provide any exit code feedback if a package
   has no test suite (see also below).
 * There are *many* other ways that packages expose their tests that don't fit
   into test_suite.  There's no *programmatic* way to know how to run their
   tests.

Note that I'm not expecting 100% coverage.  Some packages would be difficult
to fully test under a `setup.py test` regime, perhaps because they require
external resources, or environmental preparation before their tests can be
run.  I think that's fine.  If we could get 80% coverage of packages in the
Cheeseshop, and even if some packages could only run a subset of their full
test suite under this regime, it would still be a *huge* win for quality.

This doesn't even need to be the only API for running a package's test suite.
If you use some other testing regime for development, that's fine too.

What can we do to promote a standard, introspectable, programmatic way of
running a package's test suite?  Do you agree that `python setup.py test` or
`pysetup test` is *the* way it should be done?  I would be very happy if we
could define a standard, and let convention, best-practice guides, and peer
pressure (i.e. big red banners on PyPI <wink>) drive adoption.

I think we have an opportunity with Python 3.3 to establish these standards so
that as people migrate, they'll naturally adopt them.  I'm confident enough
could be backported to earlier Pythons so that it would all hang together
well.

There is no dearth of testing regimes in the Python world; the numbers might
even rival web frameworks. :) I think that's a great strength, and testament
(ha ha) to how critical we think this is for high quality software.  I would
never want to dictate which testing regime a package adopts - I just want some
way to *easily* look at a package, find out how to run *some* its tests, run
them, and dig the results out.  `python setup.py test` seems like the closest
thing we have to any kind of standard.

Next problem: reporting results.  Many test regimes provide very nice feedback
on the console for displaying the results.  I tend to use -vv to get some
increased verbosity, and when I'm just sitting at my package, I can easily see
how healthy my package is.  But for programmatic results, it's pretty crappy.
The exit code and output parsing is about all I've got, and that's definitely
no fun, especially given the wide range of testing regimes we have.

My understanding is that py.test is able to output JunitXML, which works well
for Jenkins integration.  Ideally, we'd again have some standard reporting
formats that a Python program could consume to explicitly know what happened
during a test run.  I'm thinking something like having `python setup.py test`
output a results.json file which contains a summary of the total number of
tests run, the number succeeding, failing, and erroring, and a detailed report
of all the tests, their status, and any failure output that got printed.  From
there, it would be fairly straightforward to consume in taster, or transform
into report files for Jenkins integration, etc.

You might even imagine an army of buildbots/jenkins slaves that built packages
and uploaded the results to PyPI for any number of Python versions and
implementations, and these results could be collated and nicely graphed on
each package's page.

Related to this is something I noticed with tox: there are no artifacts except
the console and log file output for the results of the tests in the various
environments.  Console output is on par with screen scraping in it's
unhappiness factor. ;)

I think that's roughly the high order bit of the results of my little
experiment.  I'm keenly interested to hear your feedback, and of course, if
you want to help move this forward, all the code is free and I'd love to work
with you.  It's Friday night and I've rambled on enough...

Cheers,
-Barry

[1] Normal Ubuntu release end-of-cycle breather :)
[2] I've submitted a paper proposal on the idea for Pycon 2012, but I plan on
    continuing to work on this even if the paper isn't accepted.
[3] https://launchpad.net/taster
[4] http://bazaar.launchpad.net/~barry/taster/trunk/view/head:/README.rst
[5] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=641314
[6] https://launchpad.net/arkose
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://lists.idyll.org/pipermail/testing-in-python/attachments/20111014/c1e08fcf/attachment.pgp>