[bip] Reproducible research

Mon Mar 9 03:42:47 PDT 2009

Hi all,

Apologies for the length.  It's a character trait ;)

On 07/03/2009 04:39, "C. Titus Brown" <ctb at msu.edu> wrote:

> On Thu, Mar 05, 2009 at 09:51:48AM +0000, Leighton Pritchard wrote:
> -> There's another issue with reproducing work from others' publications that
> -> hasn't come up yet: the work is frequently described inadequately for
> -> reproduction, in the methods section.
> -> 
> -> In my experience, this is depressingly often the case for publications that
> -> apply bioinformatics.
> [...] There's very little incentive for an accurate description of
> the process by which you arrived at your results.
> I am mildly skeptical that there's significant value to demanding
> exact reproducibility in many circumstances.

We could quibble over what you mean by 'exact' in that statement but, in
general terms, if your work is not reproducible you are not doing Science,
but rather Pseudoscience (or, in a best-case scenario, 'hypothesis
generation').  In effect, you're doing no more than generating an anecdote
(which isn't as denigrating as it might sound - many anecdotes have proven
to be useful starting points for real insight).  Not that reproducibility is
a *sufficient* claim to a correct result describing the 'true' state of the
world around us, but it is *necessary* for the Scientific Method.

There should, in general, be enough information in each publication to
enable a competent researcher to reproduce the results (within a reasonable
level of variation) and to so to verify the published claims.  Additionally,
that description should enable an informed reader to judge the relative
merits of the work, and how feasible the claimed results appear to be.

> I can think of many situations where it doesn't matter (finding a
> particular set of gene in a genome;

I think it does matter, even then.  If, say, I claim to have found new
members of a gene family in a bacterial genome by running a BLAST search,
and don't tell you anything else what database or parameters I used, then
either of the following, amongst many other options, may be true:

E-value cutoff = 10 with a BLASTN search and very lax gap opening/extension
parameters; the database is the whole genome nucleotide sequence.  All
matches are considered to be part of the same gene family, whether or not a
match crosses an in-frame stop codon, for example.

Matches are evaluated on the basis of bit scores, using an initial E-value
threshold of 1e-6 with TBLASTX and strict gap parameters.  The database only
comprises predicted CDS from our genome of interest.  Matches that did not
include exact matches to functionally-required residues were discarded.

There's potentially quite a bit of difference between those two outcomes
(even though both descriptions are short, but useful).  If I don't tell you
which I've used, how are you supposed to evaluate or reproduce the work?

> doing a particular statistical analysis on a data set; etc.).

I've seen plenty of microarray papers that describe "P<0.05" thresholds for
significance, in exploratory work, and don't say whether they mean that this
is an individual probe probability value, or a value corrected for multiple
sampling.  The implications are, again, potentially rather different in each
case.

> Are we guilty of adhering to the dogma of reproducibility at the expense
> of pragmatically cutting our losses and simply doing enough that our
> steps could probably be retraced by someone who was really interested?

Reproducibility is not a dogma, any more than any other step in any other
algorithm is.  It is a *necessary* step in the Scientific Method.  As the
paper you linked to
(http://www.cs.toronto.edu/~gvwilson/reading/mccullough-lessons-jmcb.pdf)
pointed out, it is a cornerstone of the Scientific Method.  That doesn't
mean that every experiment *has* to be reproduced: only that it, and its
results, are reproducible.

Your mileage may vary case-by-case in terms of what you consider necessary
(or minimal) information for reproducibility, of course.  The MIAME and
related dataset specifications attempt to codify this under some particular
circumstances.

> [...] I think it's much more important that published
> software be open (so I can re-run it on their or other data sets) and
> published data sets be available (so I can run my own tools on their
> data set) than that I can reproduce their exact results.

To take a slightly facetious example:
http://www.foxnews.com/story/0,2933,406101,00.html

I could buy the exact same camera, and take the exact same photographs with
it, as those hoaxers did.  I wouldn't necessarily have to use the same
settings on the camera(*).  This would be analogous to using the same
software and producing the same resulting data set from the same initial
data.  The fact remains that I'd be photographing a rubber suit, not a
Bigfoot corpse.  

In that case, reproducibility of results does not prove that the suit is or
is not Bigfoot, which on the face of it might seem to support the case that
reproducibility is perhaps not important.  However, if you were to contest
the claim that their photos were of Bigfoot, then demonstrated that you
could reproduce the same results with a rubber suit you hired from a shop -
but otherwise using their exact same methods - that permits questioning of
the claims directly, rather than the method.  If you'd used a different
method for capturing the image of the hired suit - pencil sketch, say - then
the claimants could reasonably say that you weren't following their
protocol, and so you have no compelling reason to doubt their claims (unless
you actually examine the suit, of course ;) ).

(* Regarding the camera analogy, there's a reason why police/forensic
photographers in particular, correct for white balance, insert scales into
the image and take careful note of parameters such as aperture, shutter
speed and focal length: the settings change the properties of the image.
When the resulting images are relied on as evidence, such things can be
hugely important.)

> Of course, I also tend to be more interested in bioinformatics that
> leads to testable biological hypotheses, and the ultimate goal is to get
> at least some of those hypothese tested .  As I work in a very specific
> area on very lowbrow bioinformatics, maybe that's not widely applicable.

There's a distinction you're making there, that I think could be clarified:
hypothesis generation as distinct from scientific investigation.  The thing
is though, anyone can throw up any hypothesis on the basis of whatever
prejudice or belief they like; the method that tests that hypothesis, if
it's the Scientific Method, must be reproducible.  However, if your
hypothesis generator throws up a metaphorical rubber Bigfoot suit, it would
help if you included enough information about your methods to help others
work that out before spending time on it... ;)

An example might help, here:

If all you're doing is using bioinformatics to identify a loose pool of
candidate sequences for function X, and publishing a paper describing only
those sequences demonstrated to have function X, then reproducibility is
perhaps less of an issue.  Bioinformatics in that case is being used as a
hypothesis-generating technique.  No conclusions, strictly-speaking, rest on
its accuracy or reproducibility: all claims made are about the relationship
between a disclosed sequence, and function X.  Still, if any other
interested researchers come along, they may well be interested in knowing
whether your method is likely to have exhausted all possible candidates for
function X; denying them such information is, at best, unhelpful.

However, if the publication stated that all such sequences with function X
had been found, because all members of that initial pool had been tested,
and the described sequences were the only examples with function X, then the
situation is very different.  In that case, a scientific claim is being made
on the basis of the bioinformatic investigation: that no sequences with
function X lie outwith the initial candidate pool.  Under those
circumstances, the validity of the claim being made depends directly on the
validity of the pool selection process.  Here, the bioinformatics is being
used as part of the Scientific Method, and so should be fully-described, and
reproducible.

On 07/03/2009 12:45, "Andrew Dalke" <dalke at dalkescientific.com> wrote:

> Titus:
>> I am mildly skeptical that there's significant value to demanding
>> exact reproducibility in many circumstances.
> 
> There's some quote about how a peer-review publication is only the
> first step to a result being accepted as correct.

And Andrew hits the nail on the head, here.  Peer-review is only the first
*pragmatic* step in the process of external verification, even if the
university bean-counters and employment offices like to think that all
publications with the same impact factor are of equivalent truth and/or
importance, and that publication is an end-point.  Peer review for
publication is really only a filtering step to weed out some of the more
obviously pathological, or less appropriate, requests for publication in a
particular organ.  It is neither a necessary step in the process of external
verification, or always a good or insightful one.  Watson and Crick's 1953
publication of the structure of DNA was not peer-reviewed, and yet the core
message is entirely valid.  Peer-reviewed papers have been known to be
retracted because they've been shown to be... well... lies.

The actual process of validation of claims sits with the community at large.
The peer-review of Altschul et al.'s publication of gapped BLAST wasn't the
real test of the method: the thousands and millions of BLAST searches that
gave biologically-meaningful results were.  If BLAST was clearly broken, it
wouldn't be used.  In general, that community-wide validation can't be
carried out if the methodology is not available (though, of course,
circumstances for validation differ - in particular for software tools and
algorithms, such as BLAST).

Titus:
> One more note -- Greg Wilson sent me this paper,
> 
> http://www.cs.toronto.edu/~gvwilson/reading/mccullough-lessons-jmcb.pdf
> 
> which makes some interesting arguments regarding reproducibility and
> impact, among other things.

There's a reason (maybe more than one...) why Economics is known as "The
Dismal Science" ;)

Cheers,

L.

-- 
Dr Leighton Pritchard MRSC
D131, Plant Pathology Programme, SCRI
Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
e:lpritc at scri.ac.uk       w:http://www.scri.ac.uk/staff/leightonpritchard
gpg/pgp: 0xFEFC205C       tel:+44(0)1382 562731 x2405

______________________________________________________________________
SCRI, Invergowrie, Dundee, DD2 5DA.  
The Scottish Crop Research Institute is a charitable company limited by
guarantee. 
Registered in Scotland No: SC 29367.
Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.

DISCLAIMER:

This email is from the Scottish Crop Research Institute, but the views 
expressed by the sender are not necessarily the views of SCRI and its 
subsidiaries.  This email and any files transmitted with it are
confidential

to the intended recipient at the e-mail address to which it has been 
addressed.  It may not be disclosed or used by any other than that
addressee.
If you are not the intended recipient you are requested to preserve this

confidentiality and you must not use, disclose, copy, print or rely on
this 
e-mail in any way. Please notify postmaster at scri.ac.uk quoting the 
name of the sender and delete the email from your system.

Although SCRI has taken reasonable precautions to ensure no viruses are 
present in this email, neither the Institute nor the sender accepts any 
responsibility for any viruses, and it is your responsibility to scan
the email and the attachments (if any).
______________________________________________________________________