[bip] Reproducible research

Andrew Dalke dalke at dalkescientific.com
Wed Mar 4 07:16:35 PST 2009


On Mar 4, 2009, at 12:14 PM, Giovanni Marco Dall'Olio wrote:
> Another solution to reproducibility, in a perfect world, would be that
> people write good tests for their programs.
> Let's say that I write a program that predicts the coding sequences in
> a nucleotide sequence.
> If I provide good tests for it, people should be able to reproduce my
> analisis and understand it even if they don't know the programming
> language that I have used, or even without having to have the source
> code of my scripts.

What defines the "good tests" in "If I provide good tests"?
Or was the following text an elaboration?

Aren't the most of the same factors which make good tests
true of good code?

If I write my tests in, say, APL, wouldn't people need to know
that language to understand the tests? What if the variables
were in Russian, as was the case in one code base I was trying
to debug?

If I've written a probabilistic tester for primality, my tests
are "is X prime?", but that's not going to help understand the
algorithm.

If I write a new sorting algorithm, the tests are "is the result
sorted?", which again reveals nothing about the inner workings
of the algorithm itself.

> Let's say I write a program to convert a fasta sequence to genbank.
> Instead of relying on you to look at the source code, I'll tell you
> that I have tested the script over a blank sequence, a sequence with a
> blank line in the middle of the sequence, a sequence with a wrong
> header, etc... and I provide you the instructions to run these tests
> again if you need.

And if I think you missed an important test case? How much
of the infinite input space do you have to test before you can
convince someone else that the code is correct? The Pentium
FDIV bug shows that lots of tests still doesn't catch everything.

In your scenario, suppose you had a hard-coded buffer to
read lines from the FASTA file. Some FASTA files have very
long header lines, like those which contain all record
identifiers with that sequence. Perhaps there's a buffer
overflow in your code you didn't notice, which occurs in
rare cases and causes corrupted results?

I've given an example from years ago when the AC line of
SWISS-PROT went from one line to one-or-more lines. The BioPerl
parser didn't handle that case, and no one noticed it for a
year.  The coded ended up reporting the accession numbers
from the last AC line. Tests wouldn't have helped because
when that code was written, there were no multi-line AC
fields and the spec said it was only a single line.

> For having another example, imagine if that all the openbio projects
> would have a common place to store their use cases and tests. Wouldn't
> it be easier to compare the various bio.* projects, and see how each
> one implements each problem?

There's been on-and-off projects for that for years, such
as collecting different BLAST outputs for testing reference.
It's a thankless job and it's never gone anywhere.

				Andrew
				dalke at dalkescientific.com





More information about the biology-in-python mailing list