[bip] Future of bioinformatics in python..?
Titus Brown
titus at caltech.edu
Fri Aug 3 14:29:05 PDT 2007
A few comments, I guess:
- even more than most other hard scientists, biologists are not good
programmers. This means that API and library design by biologists
is a Bad Idea unless it's going to be part of a long-running,
iterative, open process. Almost nobody I know has the patience for
that sort of thing.
- as with Web frameworks, it's too easy to write your own (buggy,
incomplete, naive, and otherwise bad) parsers in Python. This leads
to a proliferation of parsers that serve individual people's needs
but rarely can move beyond that.
- biology is moving very fast, and the solutions for today will in many
cases not be useful tomorrow, except as small building blocks.
- biology is expanding rapidly, so ditto.
- the amount of data coming from sequencing, microarrays, etc. is
becoming seriously intractable.
It's this last point that concerns me the most, at least personally. I
have little fear of iterative development, buggy parsers, or rapid
exploratory development. I'm even reasonably hopeful that I can do a
good job of training people in these areas.
I'm having a really tough time addressing scalability, though. How do
I deal with datasets that are tens if not hundreds of gb in size? Do I
really want to be designing my own naive data structures to deal with
genome- or meta-genome-scale analyses?
This is why I'm "buying into" pygr. I don't where that will lead yet,
but I'm hopeful.
cheers,
--titus
More information about the biology-in-python
mailing list