[TIP] why you should distribute tests with your application / module

Pete pfein at pobox.com
Wed Sep 17 08:24:05 PDT 2008

On Sep 16, 2008, at 7:02 PM, Jesse Noller wrote:

> On Tue, Sep 16, 2008 at 4:34 PM, Pete <pfein at pobox.com> wrote:
>> On Sep 16, 2008, at 3:07 PM, Jesse Noller wrote:
>>>> What about fixture data though?  That can easily get larger than  
>>>> the
>>>> size of the rest of your distribution...
>>> Why not generate the fixture data on the fly though? For example,  
>>> you
>>> can easily generate file data on the fly (that will always be the
>>> same) each time a test is run - I do this with file sizes ranging  
>>> from
>>> 1 mb to 100s of gigabytes. This way I don't need to check in test
>>> data, or store it. I just generate it from the ether. The same  
>>> applies
>>> to database/fixture data - why not generate it from some seed/ID on
>>> the fly?
>> Because I need to know what's in the data so that I can verify that  
>> full
>> text queries against it return the correct results. How do you make  
>> sure
>> your code is giving you the right output if you're feeding it  
>> random input?
>> Makes no sense to me...
>> --Pete
> Here is the nominal example. Normally, instead of using __file__ for
> the data, I'd use a lorem ipsum file. Note that doing it this way

> A use case is simple - you need to generate large strings or data to
> put in a file on a http server (stream from memory to the pycurl
> object) or stream it to a file to test a filesystem.

A random stream of bytes/words is going to work for me.  Remember, I'm  
doing tests on full text searches[0] - a large variety of words is  
essential.  Representative frequencies (ie 'the' appears a lot) is  
also somewhat important.  Sensible ordering is nice. And so forth.   
Permuting lorem ipsum a few thousand ways to Sunday is not going to  
give me the kind of data I need to test effectively (let alone  

The only text generator I've seen that looks feasible is this thing,  
and it'd take some hacking to make it work for my purposes: http://code.google.com/p/lorem-ipsum-generator/

I rather like the idea of generating fixture data on the fly, I just  
don't think it's going to work for me here; I can imagine other  
situations where it wouldn't either (if you were testing numerical  
algorithms by aiming for a known good result).

We still don't have a nice solution for situations where generating  
fixture data isn't an option. One approach would be to require users  
to download fixture data & just skip the tests if they're not  
available.  Hmm, maybe a nose plugin to automate that?


[0] http://en.wikipedia.org/wiki/Full_text_search

More information about the testing-in-python mailing list