[bip] Future of bioinformatics in python..?

Fri Aug 3 14:29:05 PDT 2007

A few comments, I guess:

 - even more than most other hard scientists, biologists are not good
   programmers.  This means that API and library design by biologists
   is a Bad Idea unless it's going to be part of a long-running,
   iterative, open process.  Almost nobody I know has the patience for
   that sort of thing.

 - as with Web frameworks, it's too easy to write your own (buggy,
   incomplete, naive, and otherwise bad) parsers in Python.  This leads
   to a proliferation of parsers that serve individual people's needs
   but rarely can move beyond that.

 - biology is moving very fast, and the solutions for today will in many
   cases not be useful tomorrow, except as small building blocks.

 - biology is expanding rapidly, so ditto.

 - the amount of data coming from sequencing, microarrays, etc. is
   becoming seriously intractable.

It's this last point that concerns me the most, at least personally.  I
have little fear of iterative development, buggy parsers, or rapid
exploratory development.  I'm even reasonably hopeful that I can do a
good job of training people in these areas.

I'm having a really tough time addressing scalability, though.  How do
I deal with datasets that are tens if not hundreds of gb in size?  Do I
really want to be designing my own naive data structures to deal with
genome- or meta-genome-scale analyses?

This is why I'm "buying into" pygr.  I don't where that will lead yet,
but I'm hopeful.

cheers,
--titus