[bip] Blog post on bioinformatics and Python

Peter Clarke resurgo at gmail.com
Sat Sep 20 05:17:30 PDT 2008


Ok, My euro's worth of random rant...

I have been using biopython since 2000. I think there are a number of issues.

Firstly, I agree with the the need to seperate the code that is not
'production ready'. To have the only 'separation' of code being
whether it is deprecated or not seems odd. To have partially
functional code as part of the full system also seems a simple
approach to code management that is going to leave a lot of new users
who are looking for some particular functionality unhappy with
biopython as a whole.
Secondly, bioinformatics is a broad church. I am personally interested
in networks and comparative sequence analysis, but not at all
interested in protein structure etc. There is a lot of very good
specialised code in biopython but it is patchily distributed and not
at all general for even a small proportion of people interested in
using python in biology.

I need parsers to be up to date. Updating of the biopython parser for
'normal' blast parsing was stopped without warning and so code failed
to work without any notice of why. Cogent, pygr, and others can manage
to keep their parsers up to date.

Class standardisation:
I would like someone to come up with standards for sequence,
alignment, and network classes that everyone accepts and implements
interfaces for. One of python's
strengths is being able to mix and match code but this is a lot harder
without standards.

In short I think that Biopython should be fragmented into many smaller
packages. This is probably too much work all at once so a beta
biopython 2.0 project should be started focussing initially on pep
style (biopep?) standards that all biological software in python can
use. Portions of the original codebase can be split off and new data
standards defined as they are.
This is a lot of work and needs a large input of people into the
project. This is only going to happen if people feel ownership of
their subprojects. This means a less monolithic control structure.
This could provide a framework that would make the most of python's
superiority over perl rather than just trying to replicate it. I think
that unless Biopython adopts a strategy similar to this then it is
going the way of Numerical python.

-Pete


On 9/20/08, Peter <biopython at maubp.freeserve.co.uk> wrote:
> Andrew Perry wrote:
>> The problem I have at the moment is when I go to use a Biopython
>> module, I have no idea if this is going to a be a well maintained and
>> nicely working "core" part, or a deprecated, half implemented, 'slightly
>> broken' or experimental/extra piece.
>
> Well if its actually deprecated, when you do an "import" you'll get a
> warning.  I've also started sticking "DEPRECATED" or "OBSOLETE" in the
> doc strings so they show up in the help and on
> http://biopython.org/DIST/docs/api/
>
> We've tried not to add "experimental" bits to Biopython, but there are
> still some of *old* unloved and possibly slightly broken bits... the
> last year or so has seen a gradual "spring clean" going on.
>
>> I've also noticed that a lot of potentially useful code has disappeared
>> from Biopython over the years (wasn't there a HMMER module at one
>> point, or did it never make it in ?). That is a good thing if it was
>> really
>> broken and unmaintained, but once it's gone from the mainline
>>distribution, it becomes a case of 'out of sight out of mind'.
>
> As I recall, all the recent deprecations have been unmaintained code,
> which have been either identified as obsolete (e.g. parsers for file
> formats no longer used, or web APIs which are no longer available), in
> need of significant work (e.g. dependent on a third party library
> which has changed in a non-backwards compatible way), or in some cases
> a duplication of functionality elsewhere in Biopython which is better
> maintained or documented.
>
> In the case of an HMMER parser, I don't believe we have one in
> Biopython, but code to do this would be worthwhile.
>
>> Yes, maybe we could go back to earier CVS revisions to find it .. but
>> if it was mostly working and living in an "experimental" package, then
>> there is more chance of someone finding it and fixing it.
>
> If it was "mostly working" its less likely to have been deprecated and
> dropped in the first place - but I take your point that an
> "experimental" package would be a useful staging post for new code.
>
> Peter wrote:
>>>Would having official Biopython (or BioPerl etc) hosted debian (etc)
>>>packages help here?  In theory you could add this to your list of
>>>repositories and then automatically get official Biopython releases.
>
> Andrew Perry's reply:
>> That's a great idea. I may even be tempted to volunteer to maintain
>> that, if I can get over the learning curve and get started doing proper
>> Debian python packaging.
>
> Now that could be worth following up.  We would have to ask the OBF
> about the hosting side, and try and get the existing Biopython debian
> packagers input.  Please come over to the biopthon (dev) mailing list
> to talk about it.
>
> Peter
>
> _______________________________________________
> biology-in-python mailing list - bip at lists.idyll.org.
>
> See http://bio.scipy.org/ for our Wiki.
>


-- 
Saving the DNA of the world's endangered animals



More information about the biology-in-python mailing list