[bip] Reproducible research

Thu Mar 5 01:25:18 PST 2009

On Wed, 4 Mar 2009, C. Titus Brown wrote:

> On Wed, Mar 04, 2009 at 06:56:12PM +0100, Giovanni Marco Dall'Olio wrote:

> <rant>
> That's because nobody cares if your code is right, just that you can
> publish a good paper on the results from running it.  Once.
> </rant>
>
> Unfortunately doing both good code + good publication is an awful lot of
> hard work, so if it's not rewarded *shrug*.

Amen, Brother Titus.

That's why some famous an not-so famous journals are so full of sh^H^H 
results-on-well-tempered-data-sets to be close to useless many many times.
We had to re-implement things and test them over and over in order to get 
honest performance. An in the end most of the time is better to get several 
raw data sets and work on them yourself honestly (properly remove 
redundancy, use training, testing, and validation sets...) even without 
worrying in inventing new methods.

E.g. For many methods published in sequence analysis, it should be 
mandatory for the referees to ask the question: how much better you do than 
blast itself on this task? It is always surprising to see how many things 
are NOT better.....

Although I should not complain too much, this is exactly the reason why 
companies hire us: to shovel out bullshit. &;)

--
Ivan Rossi, PhD - ivan AT biodec dot com - ivan dot rossi3 AT unibo dot it
BioDec Srl, Via Calzavecchio 20/2, 40033 Casalecchio di Reno (BO), Italy
Phone: (+39)-051-0548263 - Fax: (+39)-051-7459582 - http://www.biodec.com