[TIP] why you should distribute tests with your application / module

Jesse Noller jnoller at gmail.com
Wed Sep 17 10:16:13 PDT 2008

On Wed, Sep 17, 2008 at 12:56 PM, Pete <pfein at pobox.com> wrote:
> On Sep 17, 2008, at 10:40 AM, Jesse Noller wrote:
>> Then don't make it completely random - weight the selections instead.
>> My example was focused on random-ish file data that you could *always*
>> reproduce with a given key. For a full text search, you're going could
>> use the same seed concept, but break the words from your source (a
>> static lorem ipsum file[0]) into groups and assign them
>> popularity/frequencies and so on - or use the generator class from the
>> lorem-ipsum-generator you linked to[1]. Heck use /usr/share/dict/words
>> to generate the data :)
> Y'know, this is starting to sound like a lot of work... all for the purpose
> of generating nonsense text to avoid downloading a file.  Bandwidth is
> cheap, my neurons are not.
> I still maintain that there are situations where it's desirable/preferable
> to have real, static fixture data.  Here are a few more OTOMH: financial
> algorithms, image processing, regression tests of text parsing. Anyone have
> recommendations on best practices for doing so (as opposed to continuing to
> tell me how to avoid the problem entirely)?  Thanks.

A few tricks I've used to do this, is to add a variable/module
containing the test set data in a compressed format in a variable and
uncompressed it on-demand within the suite and passed it to the
relevant objects, it depends on the size of the data. Keeping it in a
compressed variable (as a string) makes it hard to alter the data
though, but it keeps the data nice and close to the test that needs

Ultimately, if it's a small-enough dataset, I'd include it in a
tests/dataset directory within the src tree/dist and check for it's
existence. If it doesn't exist, the test should raise a Skiptest
citing the lack of a dataset.

More information about the testing-in-python mailing list