[bip] Blog post on bioinformatics and Python

Andrew Perry ajperry at pansapiens.com
Fri Sep 19 23:47:36 PDT 2008


Phew .. big thread and I'm very late to the party.
Thought I'd chime in with some of my thoughts, as a long time 'casual' user
of Biopython.

>Not in most circumstances I have dealt with. I can more easily wrap up
>a few small modules with py2exe and distribute that than deal with
>BioPython. I can more easily upload a few small modules for a web
>application - recently I used Google's App Engine - than deal with
>BioPython. Using setuptools, install_requires, easy_install, and
>avoiding anything that has non-managed dependencies makes using small
>tools easy. YMMV.

While I wouldn't say it was trivial, I've used Biopython with Google App
Engine, and was able to treat the parts I needed as "a few small modules". I
pulled down the CVS version of Biopython and put "Bio" into my app
directory. Since I was only using the EUtils / Entrez stuff I just figured
out which parts of Biopython I needed by looking at which modules imported
each other. I was able delete most of Biopython that I didn't need, apply my
own fixes (to use GAE's urlfetch), and only uploaded a very small portion to
GAE for hosting. If you are curious, it's in SVN here:
http://code.google.com/p/resolveref/source/browse . Of course if C
extensions or external programs were required it wouldn't have been
possible, but it turned out less painful than I expected. Yes, YMMV.

Bruce Southey wrote:
>I think that BioPython needs to be split.
..snip...
>Sequence stuff (handling sequences and database records; addressing
>BLAST and multiple alignment etc.would be one component. I'll split
>these further but probably no gain and most people would want both
>anyhow. The second part would be things like logistic regression,
>cluster, and microarray-related affy stuff as well as most of the topics
>covered in Jason Kinser's book

(I wrote this before I caught up with the whole thread. Some of this
reiterates things others have suggested [eg Ryan Raaum's suggestions]).
I agree that Biopython could benefit from being split up in to some smaller
sub-packages (eg eggs), but maybe of a different nature. I think having a
"core" package (well tested, maintained and documented parsers for common
sequence formats [largely the cookbook stuff], _pure python only_, only
depends on itself), an "extras" package (for all the less often used stuff
that still works, but might have become unmaintained, undocumented and may
require C extensions), and an "experimental" package (for less well tested
stuff that wants to make it into "core" or "extras" one day). Something
reminiscent of the Debian stable/unstable/testing or Ubuntu
main/universe/multiverse type divisions.

The problem I have at the moment is when I go to use a Biopython module, I
have no idea if this is going to a be a well maintained and nicely working
"core" part, or a deprecated, half implemented, 'slightly broken' or
experimental/extra piece. I've also noticed that a lot of potentially useful
code has disappeared from Biopython over the years (wasn't there a HMMER
module at one point, or did it never make it in ?). That is a good thing if
it was really broken and unmaintained, but once it's gone from the mainline
distribution, it becomes a case of 'out of sight out of mind'. Yes, maybe we
could go back to earier CVS revisions to find it .. but if it was mostly
working and living in an "experimental" package, then there is more chance
of someone finding it and fixing it.
I think the 'split it up' idea certainly warrants more discussion.

Bruce wrote:
>Would having official Biopython (or BioPerl etc) hosted debian (etc)
>packages help here?  In theory you could add this to your list of
>repositories and then automatically get official Biopython releases.

That's a great idea. I may even be tempted to volunteer to maintain that, if
I can get over the learning curve and get started doing proper Debian python
packaging.

Bruce Southey wrote:
>Just to update and note that I will move any further discussion to the
>biopython list.

Sensible idea. Me too.

Andrew Perry


On Fri, Sep 19, 2008 at 11:48 PM, Bruce Southey <bsouthey at gmail.com> wrote:

> Hi,
> Just to update and note that I will move any further discussion to the
> biopython list.
>
> I got biopython 1.48 installed and noted a few trivial issues along the
> way that Peter has already addressed most of these. (Although the
> Python2.5 + Numeric bug probably will not be.) Basically BioPython needs
> better documentation and examples - perhaps along the lines of numpy's
> examples (http://www.scipy.org/Numpy_Example_List_With_Doc).
>
> Regards
> Bruce
>
>
>
> _______________________________________________
> biology-in-python mailing list - bip at lists.idyll.org<https://mail.google.com/mail?view=cm&tf=0&to=bip@lists.idyll.org>
> .
>
> See http://bio.scipy.org/ for our Wiki.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.idyll.org/pipermail/biology-in-python/attachments/20080920/2af0972a/attachment.htm 


More information about the biology-in-python mailing list