[bip] Blog post on bioinformatics and Python

Thu Sep 18 04:53:02 PDT 2008

Bruce wrote:
>
> Other important ones include:
>
> Hitting known 'bugs' because some database changed (SwissProt) that
> required workarounds to avoid complete crashes. (Relying on distros to
> provide things that I need does not work especially when someone says
> get the latest version from the svn or the distro provides broken
> packages.)

This isn't a problem in Biopython per se - having to update parsers
due to file format changes (be these from databases or updated
software tools) is something any bioinformatics library has to deal
with. Most "stable" Linux distributions won't track the latest version
of ANY software, so unfortunately if/when some file format next
changes and breaks a parser, you will need to update Biopython
manually - rather than via your distribution's packaging system.
Would having official Biopython (or BioPerl etc) hosted debian (etc)
packages help here?  In theory you could add this to your list of
repositories and then automatically get official Biopython releases.
This would be quite a big effort and we would need people with
packaging experience to get involved.

> Use of iterators and what/how to get specific information out of
> BioPython objects.

Could you clarify these points please?  Are you in favour of Biopython
using python iterators (e.g. via generator functions)?  And what
Biopython objects in particular were you trying to extract data from?

> ...
>
> However, stumbling blocks are a little useless without trying to remove
> them.

I agree that identifying stumbling blocks but not trying to deal with
them is useless.  However, the first step here is giving feedback (so
thank you to all those giving feedback on this thread).

> Also referring to other posts that came while writing this. I
> think that BioPython needs to be split. While maintaining multiple
> packages is a problem which is why I like how Scientific Python  does
> it. Scientific Python is really NumPy, SciPy and SciPy kits - ignoring
> the fact that scipy has extra dependencies (like some language that I
> don't know) and was/is hard to install (try getting Atlas or when
> distros screwup). Scikits (these were the sandboxes of earlier
> releases), like learn (machine learning), require SciPy but are
> otherwise independent and do develop at a different pace. Really it
> allows updating certain components and avoiding dependencies.

[As an aside, ScientificPython is actually a separate project by
Konrad Hinsen, used in MMTK, which also builds on Numeric/numpy.  This
is nearly as confusing as the fact "NumPy" was once used as shorthand
for Numeric but now means its replacement library.]

There is something to be said for splitting up a large project, but
there are big downsides too.  In addition to the more complicated
release cycle, the different sub-projects must be coordinated -
especially with inter-module dependencies.  Also, installation is
going to be more confusing for the end user - which bits will they
need?

Peter