[TIP] Tasting the Cheeseshop

Fri Oct 14 22:03:43 PDT 2011

Barry,

thanks for tackling this and sharing your thoughts!

Two quick inputs from my side:

- tox could easily grow XML or other standardized output but it would
  not report results per test function/item - this is left to
  the individual test tool that is invoked.  If that is standardized
  (ad we could recard junitxml as such for now) then we could of course
  embed the junitxml output into the prospective tox output if that helps.

- i don't think it's a good idea to incorporate test standards 
  into setup.py because 
  a) setup.py is executable code, is not introspectable and
  b) setup.py is meant to go away with the new distutils2 for a pure
  declarative approach (tox's config could easily merge in there) and 
  c) it's kind of requiring a specific unittest-pkg based model where tests
  are inlined into the package. There are reasons why substantial amount
  of projects use or need different approaches.

FWIW i am happy to share moving tox forward with other authors - Michael
Foord has expressed interest there as well in the past and Kumar (nose
maintainer) gave a talk about it at last PyCon.  I'd be glad if tox
evolved to introduce a test running standard finally merging with setup.cfg.

best,
holger

On Fri, Oct 14, 2011 at 19:41 -0400, Barry Warsaw wrote:
> I've long had this idea in my head to test all the packages in the Cheeseshop.
> I think it would be cool if we could overlay some sense of which versions and
> implementations of Python a packages is *tested* to be compatible with, and
> make available a general health check of a package's test suite.
> 
> I've had a little bit of downtime at work[1] recently, so I wrote some
> experimental code.  I want to share what I have with you now[2], and get your
> feedback because I think this would be an interesting project, and because I
> think the basic Python facilities can be improved.
> 
> I call the project "Taster" as a play on "tester", "taste tester", and
> cheese. :) The project and code is up on Launchpad[3] but it's a bit rough and
> there's not a lot of documentation.  I'll work on filling out the README[4]
> with more details about where I want to go, in the meantime, this email is it.
> 
> So the rough idea is this: I want to download and unpack packages from PyPI,
> introspect them to see if they have a test suite, then run the test suite in a
> protected environment against multiple versions of Python, and collate the
> results for publishing.  I'd like to be able to run the tool against, say all
> packages uploaded today, since last month, etc., or run it based on the PyPI
> RSS feed so that packages would get tested as they're uploaded.
> 
> A related project is to enable testing by default of Python packages when they
> are built for Debian.  Right now, you have to override the test rules for any
> kind of package tests short of `make check` and most Python packages don't
> have a Makefile.  I've had some discussion with the debhelper maintainer[5],
> and he's amenable but skeptical that there actually *is* a Python standard for
> running tests.
> 
> Taster contains a few scripts for demonstration.  One queries PyPI for a set
> of packages to download, one script downloads the sdists of any number of
> packages, one unpacks the tarballs/zips, and one runs some tests.  It should
> be fairly obvious when you grab the branch which script does what.
> 
> I had planned on using tox (which is awesome btw :) for doing the multi-Python
> tests.  I'll describe later why this is a bit problematic.  I also planned on
> using Jenkins to drive and publish the results, but that's also difficult for
> a couple of reasons.  Finally, I planned to use something like LXC or
> arkose[6] to isolate the tests from the host system so evil packages can't do
> evil things.  That also turned out to be a bit difficult, so right now I run
> the tests in an schroot.
> 
> As you might imagine, I've run into a number of problems (heck, this is just
> an experiment), and I'm hoping to generate some discussion about what we can
> do to make some of the tasks easier.  I'm motivated to work on better Python
> support if we can come to some agreement about what we can and should do.
> 
> On the bright side, querying PyPI, downloading packages, and unpacking them
> seems pretty easy.  PyPI's data has nice XMLRPC and JSON interfaces, though
> it's a bit sad you can't just use the JSON API for everything.  No matter,
> those are the easy parts.
> 
> The first difficulty comes when you want to run a package's tests.  In my
> mind, *the* blessed API for that should be:
> 
>     $ python setup.py test
> 
> and a package's setup.py (or the equivalent in setup.cfg, which I don't know
> yet) would contain:
> 
>     setup(
>         ...
>         test_suite='foo.bar.tests',
>         use_2to3=True,
>         convert_2to3_doctests=[mydoctests],
>         ...
>         )
> 
> In fact, I have this in all my own packages now and I think it works well.
> 
> Here are some problems:
> 
>  * setuptools or distribute is required.  I don't think the standard distutils
>    API supports the `test_suite` key.
>  * egg-info says nothing about `test_suite` so you basically have to grep
>    setup.py to see if it supports it.  Blech.
>  * `python setup.py test` doesn't provide any exit code feedback if a package
>    has no test suite (see also below).
>  * There are *many* other ways that packages expose their tests that don't fit
>    into test_suite.  There's no *programmatic* way to know how to run their
>    tests.
> 
> Note that I'm not expecting 100% coverage.  Some packages would be difficult
> to fully test under a `setup.py test` regime, perhaps because they require
> external resources, or environmental preparation before their tests can be
> run.  I think that's fine.  If we could get 80% coverage of packages in the
> Cheeseshop, and even if some packages could only run a subset of their full
> test suite under this regime, it would still be a *huge* win for quality.
> 
> This doesn't even need to be the only API for running a package's test suite.
> If you use some other testing regime for development, that's fine too.
> 
> What can we do to promote a standard, introspectable, programmatic way of
> running a package's test suite?  Do you agree that `python setup.py test` or
> `pysetup test` is *the* way it should be done?  I would be very happy if we
> could define a standard, and let convention, best-practice guides, and peer
> pressure (i.e. big red banners on PyPI <wink>) drive adoption.
> 
> I think we have an opportunity with Python 3.3 to establish these standards so
> that as people migrate, they'll naturally adopt them.  I'm confident enough
> could be backported to earlier Pythons so that it would all hang together
> well.
> 
> There is no dearth of testing regimes in the Python world; the numbers might
> even rival web frameworks. :) I think that's a great strength, and testament
> (ha ha) to how critical we think this is for high quality software.  I would
> never want to dictate which testing regime a package adopts - I just want some
> way to *easily* look at a package, find out how to run *some* its tests, run
> them, and dig the results out.  `python setup.py test` seems like the closest
> thing we have to any kind of standard.
> 
> Next problem: reporting results.  Many test regimes provide very nice feedback
> on the console for displaying the results.  I tend to use -vv to get some
> increased verbosity, and when I'm just sitting at my package, I can easily see
> how healthy my package is.  But for programmatic results, it's pretty crappy.
> The exit code and output parsing is about all I've got, and that's definitely
> no fun, especially given the wide range of testing regimes we have.
> 
> My understanding is that py.test is able to output JunitXML, which works well
> for Jenkins integration.  Ideally, we'd again have some standard reporting
> formats that a Python program could consume to explicitly know what happened
> during a test run.  I'm thinking something like having `python setup.py test`
> output a results.json file which contains a summary of the total number of
> tests run, the number succeeding, failing, and erroring, and a detailed report
> of all the tests, their status, and any failure output that got printed.  From
> there, it would be fairly straightforward to consume in taster, or transform
> into report files for Jenkins integration, etc.
> 
> You might even imagine an army of buildbots/jenkins slaves that built packages
> and uploaded the results to PyPI for any number of Python versions and
> implementations, and these results could be collated and nicely graphed on
> each package's page.
> 
> Related to this is something I noticed with tox: there are no artifacts except
> the console and log file output for the results of the tests in the various
> environments.  Console output is on par with screen scraping in it's
> unhappiness factor. ;)
> 
> I think that's roughly the high order bit of the results of my little
> experiment.  I'm keenly interested to hear your feedback, and of course, if
> you want to help move this forward, all the code is free and I'd love to work
> with you.  It's Friday night and I've rambled on enough...
> 
> Cheers,
> -Barry
> 
> [1] Normal Ubuntu release end-of-cycle breather :)
> [2] I've submitted a paper proposal on the idea for Pycon 2012, but I plan on
>     continuing to work on this even if the paper isn't accepted.
> [3] https://launchpad.net/taster
> [4] http://bazaar.launchpad.net/~barry/taster/trunk/view/head:/README.rst
> [5] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=641314
> [6] https://launchpad.net/arkose

> _______________________________________________
> testing-in-python mailing list
> testing-in-python at lists.idyll.org
> http://lists.idyll.org/listinfo/testing-in-python