[bip] Blog post on bioinformatics and Python

Bruce Southey bsouthey at gmail.com
Thu Sep 18 08:55:05 PDT 2008


Peter wrote:
> Bruce wrote:
>   
>> Other important ones include:
>>
>> Hitting known 'bugs' because some database changed (SwissProt) that
>> required workarounds to avoid complete crashes. (Relying on distros to
>> provide things that I need does not work especially when someone says
>> get the latest version from the svn or the distro provides broken
>> packages.)
>>     
>
> This isn't a problem in Biopython per se - having to update parsers
> due to file format changes (be these from databases or updated
> software tools) is something any bioinformatics library has to deal
> with. Most "stable" Linux distributions won't track the latest version
> of ANY software, so unfortunately if/when some file format next
> changes and breaks a parser, you will need to update Biopython
> manually - rather than via your distribution's packaging system.
> Would having official Biopython (or BioPerl etc) hosted debian (etc)
> packages help here?  In theory you could add this to your list of
> repositories and then automatically get official Biopython releases.
> This would be quite a big effort and we would need people with
> packaging experience to get involved.
>   
I am not the person to ask. While I use distro's packages for most of 
the system, I install the important software including svn versions from 
source (I've been in Linux a long long time). The worst is getting those 
dependencies installed which supports at least a core component that 
does not use any dependencies. 

This is a community thing and requires people trained to do it. I don't 
fully remember but recently SUSE and Fedora (?) were offering ways to 
repackage software for different distributions.


>> Use of iterators and what/how to get specific information out of
>> BioPython objects.
>>     
>
> Could you clarify these points please?  Are you in favour of Biopython
> using python iterators (e.g. via generator functions)?  And what
> Biopython objects in particular were you trying to extract data from?
>   
Part of it is a lack of understanding but I have not bothered to go 
back. So what I say is probably wrong and out of date. I do not really 
understand Python iterators and generators as my knowledge is still 
mainly Python 2.0 and have not bothered that much with the new language 
features. For what I wanted using .next() really was not an option 
because I thought that I would need to get specific entries not proceed 
in ordered approach. Now I definitely need to access specific entries to 
match across files or databases. Today I looked at at the BioPython 
tutorial Chapter 4 and saw SeqIO.to_dict which would have helped in that 
regard.

I tend to blast multiple sequences at the same time (it is faster than 
one at a time) so the .next() is not an option (and I do not see to_dict 
option in the BLAST part of the tutorial). At the time is was also 
trying to extract things less common things like 'Hsp_hit-frame' (?) 
that I did not find as being outputted. I thought I would use BioPython 
1.47 (not sure how to find out version under Python) to check this so I 
just tried to run the tutorial code in Section 6.6.2  'Parsing a file 
full of BLAST runs' on one of my xml files. First problem undefined 
variables (did file a bug - now fixed!!). Second problem 'ValueError: 
Unexpected end of stream' which is hard to determine the cause. However, 
this may be due to using blast version 2.2.18 (released March 08) as I 
think that similar occurrence happened when I first was trying 
BioPython. Just highlights a frustration of using the BioPython codebase 
as there is no clue to the problem or solution (could BioPython at least 
track which version of blast is known to work?).

Bruce




More information about the biology-in-python mailing list