[TIP] why you should distribute tests with your application / module
C. Titus Brown
ctb at msu.edu
Wed Sep 17 10:14:40 PDT 2008
On Wed, Sep 17, 2008 at 11:56:35AM -0500, Pete wrote:
-> On Sep 17, 2008, at 10:40 AM, Jesse Noller wrote:
-> > Then don't make it completely random - weight the selections instead.
-> > My example was focused on random-ish file data that you could *always*
-> > reproduce with a given key. For a full text search, you're going could
-> > use the same seed concept, but break the words from your source (a
-> > static lorem ipsum file[0]) into groups and assign them
-> > popularity/frequencies and so on - or use the generator class from the
-> > lorem-ipsum-generator you linked to[1]. Heck use /usr/share/dict/words
-> > to generate the data :)
->
-> Y'know, this is starting to sound like a lot of work... all for the
-> purpose of generating nonsense text to avoid downloading a file.
-> Bandwidth is cheap, my neurons are not.
->
-> I still maintain that there are situations where it's desirable/
-> preferable to have real, static fixture data. Here are a few more
-> OTOMH: financial algorithms, image processing, regression tests of
-> text parsing. Anyone have recommendations on best practices for doing
-> so (as opposed to continuing to tell me how to avoid the problem
-> entirely)? Thanks.
I'm not sure what you mean by "best practices" -- for distribution of
the data, or for finding it in the first place?
I can't speak particularly to distribution, but I try to include data
sets for regression in any tests for sufficiently complex frameworks.
It helps me make sure that stochasticity is only creeping in where it's
*supposed* to.
--titus
--
C. Titus Brown, ctb at msu.edu
More information about the testing-in-python
mailing list