This sounds similar to <a href="http://pycheesecake.org/">http://pycheesecake.org/</a> by Grig and <span class="Apple-style-span" style="color: rgb(119, 119, 119); font-family: arial, sans-serif; font-size: 13px; background-color: rgb(255, 255, 255); ">Michał</span>.  Just making sure you knew that it existed.<div>

<br></div><div>Paul</div><div><br><br><div class="gmail_quote">On Fri, Oct 14, 2011 at 4:41 PM, Barry Warsaw <span dir="ltr">&lt;<a href="mailto:barry@python.org">barry@python.org</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

I&#39;ve long had this idea in my head to test all the packages in the Cheeseshop.<br>

I think it would be cool if we could overlay some sense of which versions and<br>

implementations of Python a packages is *tested* to be compatible with, and<br>

make available a general health check of a package&#39;s test suite.<br>

<br>

I&#39;ve had a little bit of downtime at work[1] recently, so I wrote some<br>

experimental code.  I want to share what I have with you now[2], and get your<br>

feedback because I think this would be an interesting project, and because I<br>

think the basic Python facilities can be improved.<br>

<br>

I call the project &quot;Taster&quot; as a play on &quot;tester&quot;, &quot;taste tester&quot;, and<br>

cheese. :) The project and code is up on Launchpad[3] but it&#39;s a bit rough and<br>

there&#39;s not a lot of documentation.  I&#39;ll work on filling out the README[4]<br>

with more details about where I want to go, in the meantime, this email is it.<br>

<br>

So the rough idea is this: I want to download and unpack packages from PyPI,<br>

introspect them to see if they have a test suite, then run the test suite in a<br>

protected environment against multiple versions of Python, and collate the<br>

results for publishing.  I&#39;d like to be able to run the tool against, say all<br>

packages uploaded today, since last month, etc., or run it based on the PyPI<br>

RSS feed so that packages would get tested as they&#39;re uploaded.<br>

<br>

A related project is to enable testing by default of Python packages when they<br>

are built for Debian.  Right now, you have to override the test rules for any<br>

kind of package tests short of `make check` and most Python packages don&#39;t<br>

have a Makefile.  I&#39;ve had some discussion with the debhelper maintainer[5],<br>

and he&#39;s amenable but skeptical that there actually *is* a Python standard for<br>

running tests.<br>

<br>

Taster contains a few scripts for demonstration.  One queries PyPI for a set<br>

of packages to download, one script downloads the sdists of any number of<br>

packages, one unpacks the tarballs/zips, and one runs some tests.  It should<br>

be fairly obvious when you grab the branch which script does what.<br>

<br>

I had planned on using tox (which is awesome btw :) for doing the multi-Python<br>

tests.  I&#39;ll describe later why this is a bit problematic.  I also planned on<br>

using Jenkins to drive and publish the results, but that&#39;s also difficult for<br>

a couple of reasons.  Finally, I planned to use something like LXC or<br>

arkose[6] to isolate the tests from the host system so evil packages can&#39;t do<br>

evil things.  That also turned out to be a bit difficult, so right now I run<br>

the tests in an schroot.<br>

<br>

As you might imagine, I&#39;ve run into a number of problems (heck, this is just<br>

an experiment), and I&#39;m hoping to generate some discussion about what we can<br>

do to make some of the tasks easier.  I&#39;m motivated to work on better Python<br>

support if we can come to some agreement about what we can and should do.<br>

<br>

On the bright side, querying PyPI, downloading packages, and unpacking them<br>

seems pretty easy.  PyPI&#39;s data has nice XMLRPC and JSON interfaces, though<br>

it&#39;s a bit sad you can&#39;t just use the JSON API for everything.  No matter,<br>

those are the easy parts.<br>

<br>

The first difficulty comes when you want to run a package&#39;s tests.  In my<br>

mind, *the* blessed API for that should be:<br>

<br>

    $ python setup.py test<br>

<br>

and a package&#39;s setup.py (or the equivalent in setup.cfg, which I don&#39;t know<br>

yet) would contain:<br>

<br>

    setup(<br>

        ...<br>

        test_suite=&#39;foo.bar.tests&#39;,<br>

        use_2to3=True,<br>

        convert_2to3_doctests=[mydoctests],<br>

        ...<br>

        )<br>

<br>

In fact, I have this in all my own packages now and I think it works well.<br>

<br>

Here are some problems:<br>

<br>

 * setuptools or distribute is required.  I don&#39;t think the standard distutils<br>

   API supports the `test_suite` key.<br>

 * egg-info says nothing about `test_suite` so you basically have to grep<br>

   setup.py to see if it supports it.  Blech.<br>

 * `python setup.py test` doesn&#39;t provide any exit code feedback if a package<br>

   has no test suite (see also below).<br>

 * There are *many* other ways that packages expose their tests that don&#39;t fit<br>

   into test_suite.  There&#39;s no *programmatic* way to know how to run their<br>

   tests.<br>

<br>

Note that I&#39;m not expecting 100% coverage.  Some packages would be difficult<br>

to fully test under a `setup.py test` regime, perhaps because they require<br>

external resources, or environmental preparation before their tests can be<br>

run.  I think that&#39;s fine.  If we could get 80% coverage of packages in the<br>

Cheeseshop, and even if some packages could only run a subset of their full<br>

test suite under this regime, it would still be a *huge* win for quality.<br>

<br>

This doesn&#39;t even need to be the only API for running a package&#39;s test suite.<br>

If you use some other testing regime for development, that&#39;s fine too.<br>

<br>

What can we do to promote a standard, introspectable, programmatic way of<br>

running a package&#39;s test suite?  Do you agree that `python setup.py test` or<br>

`pysetup test` is *the* way it should be done?  I would be very happy if we<br>

could define a standard, and let convention, best-practice guides, and peer<br>

pressure (i.e. big red banners on PyPI &lt;wink&gt;) drive adoption.<br>

<br>

I think we have an opportunity with Python 3.3 to establish these standards so<br>

that as people migrate, they&#39;ll naturally adopt them.  I&#39;m confident enough<br>

could be backported to earlier Pythons so that it would all hang together<br>

well.<br>

<br>

There is no dearth of testing regimes in the Python world; the numbers might<br>

even rival web frameworks. :) I think that&#39;s a great strength, and testament<br>

(ha ha) to how critical we think this is for high quality software.  I would<br>

never want to dictate which testing regime a package adopts - I just want some<br>

way to *easily* look at a package, find out how to run *some* its tests, run<br>

them, and dig the results out.  `python setup.py test` seems like the closest<br>

thing we have to any kind of standard.<br>

<br>

Next problem: reporting results.  Many test regimes provide very nice feedback<br>

on the console for displaying the results.  I tend to use -vv to get some<br>

increased verbosity, and when I&#39;m just sitting at my package, I can easily see<br>

how healthy my package is.  But for programmatic results, it&#39;s pretty crappy.<br>

The exit code and output parsing is about all I&#39;ve got, and that&#39;s definitely<br>

no fun, especially given the wide range of testing regimes we have.<br>

<br>

My understanding is that py.test is able to output JunitXML, which works well<br>

for Jenkins integration.  Ideally, we&#39;d again have some standard reporting<br>

formats that a Python program could consume to explicitly know what happened<br>

during a test run.  I&#39;m thinking something like having `python setup.py test`<br>

output a results.json file which contains a summary of the total number of<br>

tests run, the number succeeding, failing, and erroring, and a detailed report<br>

of all the tests, their status, and any failure output that got printed.  From<br>

there, it would be fairly straightforward to consume in taster, or transform<br>

into report files for Jenkins integration, etc.<br>

<br>

You might even imagine an army of buildbots/jenkins slaves that built packages<br>

and uploaded the results to PyPI for any number of Python versions and<br>

implementations, and these results could be collated and nicely graphed on<br>

each package&#39;s page.<br>

<br>

Related to this is something I noticed with tox: there are no artifacts except<br>

the console and log file output for the results of the tests in the various<br>

environments.  Console output is on par with screen scraping in it&#39;s<br>

unhappiness factor. ;)<br>

<br>

I think that&#39;s roughly the high order bit of the results of my little<br>

experiment.  I&#39;m keenly interested to hear your feedback, and of course, if<br>

you want to help move this forward, all the code is free and I&#39;d love to work<br>

with you.  It&#39;s Friday night and I&#39;ve rambled on enough...<br>

<br>

Cheers,<br>

-Barry<br>

<br>

[1] Normal Ubuntu release end-of-cycle breather :)<br>

[2] I&#39;ve submitted a paper proposal on the idea for Pycon 2012, but I plan on<br>

    continuing to work on this even if the paper isn&#39;t accepted.<br>

[3] <a href="https://launchpad.net/taster" target="_blank">https://launchpad.net/taster</a><br>

[4] <a href="http://bazaar.launchpad.net/~barry/taster/trunk/view/head:/README.rst" target="_blank">http://bazaar.launchpad.net/~barry/taster/trunk/view/head:/README.rst</a><br>

[5] <a href="http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=641314" target="_blank">http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=641314</a><br>

[6] <a href="https://launchpad.net/arkose" target="_blank">https://launchpad.net/arkose</a><br>

<br>_______________________________________________<br>

testing-in-python mailing list<br>

<a href="mailto:testing-in-python@lists.idyll.org">testing-in-python@lists.idyll.org</a><br>

<a href="http://lists.idyll.org/listinfo/testing-in-python" target="_blank">http://lists.idyll.org/listinfo/testing-in-python</a><br>

<br></blockquote></div><br></div>