[bip] Blog post on bioinformatics and Python

Ryan Raaum ryan.raaum at gmail.com
Wed Sep 17 10:24:20 PDT 2008


> Having one large package to install (with optional dependencies) to
> support your distributed tool is surely easier than dealing with
> several small ones?

Not in most circumstances I have dealt with. I can more easily wrap up
a few small modules with py2exe and distribute that than deal with
BioPython. I can more easily upload a few small modules for a web
application - recently I used Google's App Engine - than deal with
BioPython. Using setuptools, install_requires, easy_install, and
avoiding anything that has non-managed dependencies makes using small
tools easy. YMMV.

> You can install Biopython without Numeric or Numpy installed - and it
> will work fine, assuming you don't use the cluster library, PDB
> parsing or the other numerical bits.  If all you care about is
> sequences, BLAST and Entrez for example, you'll be fine.

In my opinion, the whole point of a big all-in-one
solves-all-my-problems library is that you install it, and then it
solves all your problems. If parts of the library randomly fail (and
it will appear random to someone who is not intimately familiar with
the inner workings and structure of every element of the library) when
"optional" dependencies are not installed, then those dependencies are
not optional. I'll give you the external program management tools
(clustal, BLAST) as being things that are "allowed" to fail without
the external program because you should only ever try to use them when
you have the external program already, but nothing else. Just offhand,
I wouldn't obviously expect PDB parsing to need Numeric or Numpy the
same way that I should clearly expect the standalone BLAST runner to
require standalone BLAST.

Furthermore, optional dependencies should improve functionality, not
introduce it. For example, without optional plotting library X, the
(fictional) DotPlot methods produce ugly (but functional!) ASCII
plots; if you install optional plotting library X, you can get nice
.gifs - if you install super fancy optional plotting library Y, you
can get magical 3d floating holograms. If, on the other hand, DotPlot
doesn't work at all without a plotting library, then a plotting
library is a requirement regardless of whether you call it "required"
or "optional".

When I install something and it doesn't clearly and loudly fail, then
I expect everything to work. If BioPython can be installed without
Numeric or Numpy when significant functionality depends on those
libraries, then I do not think that is a feature, but a bug: the
BioPython installation can silently fail.

However, all this being said, these are all choices that the BioPython
team has made that may be the RIGHT choices for you and many others.
But they are not the right choices for me.

-Ryan



More information about the biology-in-python mailing list