[TIP] why you should distribute tests with your application / module

C. Titus Brown ctb at msu.edu
Wed Sep 17 10:14:40 PDT 2008


On Wed, Sep 17, 2008 at 11:56:35AM -0500, Pete wrote:
-> On Sep 17, 2008, at 10:40 AM, Jesse Noller wrote:
-> > Then don't make it completely random - weight the selections instead.
-> > My example was focused on random-ish file data that you could *always*
-> > reproduce with a given key. For a full text search, you're going could
-> > use the same seed concept, but break the words from your source (a
-> > static lorem ipsum file[0]) into groups and assign them
-> > popularity/frequencies and so on - or use the generator class from the
-> > lorem-ipsum-generator you linked to[1]. Heck use /usr/share/dict/words
-> > to generate the data :)
-> 
-> Y'know, this is starting to sound like a lot of work... all for the  
-> purpose of generating nonsense text to avoid downloading a file.   
-> Bandwidth is cheap, my neurons are not.
-> 
-> I still maintain that there are situations where it's desirable/ 
-> preferable to have real, static fixture data.  Here are a few more  
-> OTOMH: financial algorithms, image processing, regression tests of  
-> text parsing. Anyone have recommendations on best practices for doing  
-> so (as opposed to continuing to tell me how to avoid the problem  
-> entirely)?  Thanks.

I'm not sure what you mean by "best practices" -- for distribution of
the data, or for finding it in the first place?

I can't speak particularly to distribution, but I try to include data
sets for regression in any tests for sufficiently complex frameworks.
It helps me make sure that stochasticity is only creeping in where it's
*supposed* to.

--titus
-- 
C. Titus Brown, ctb at msu.edu



More information about the testing-in-python mailing list