[bip] Blog post on bioinformatics and Python

Bruce Southey bsouthey at gmail.com
Thu Sep 18 11:28:12 PDT 2008


C. Titus Brown wrote:
> On Thu, Sep 18, 2008 at 10:55:05AM -0500, Bruce Southey wrote:
> -> >> Use of iterators and what/how to get specific information out of
> -> >> BioPython objects.
> -> >
> -> > Could you clarify these points please?  Are you in favour of Biopython
> -> > using python iterators (e.g. via generator functions)?  And what
> -> > Biopython objects in particular were you trying to extract data from?
> -> >   
> -> Part of it is a lack of understanding but I have not bothered to go 
> -> back. So what I say is probably wrong and out of date. I do not really 
> -> understand Python iterators and generators as my knowledge is still 
> -> mainly Python 2.0 and have not bothered that much with the new language 
> -> features. For what I wanted using .next() really was not an option 
> -> because I thought that I would need to get specific entries not proceed 
> -> in ordered approach. Now I definitely need to access specific entries to 
> -> match across files or databases. Today I looked at at the BioPython 
> -> tutorial Chapter 4 and saw SeqIO.to_dict which would have helped in that 
> -> regard.
>
> Why not:
>
>    data = list(data)
>
> ?  That will take any iterator/generator and turn it into a list.
>   
(Ignoring the previous answer given)
Easy: It is a waste of effort converting one unknown object into a list 
of unknown objects that may or may not be the same.
Slightly harder: need to preselect the data by different criteria (hit, 
score, evalue, query name) - now would have to parse a list of unknowns...

> There's no real penalty for doing this (if you need a random access
> list, then you need to fully parse the file anyway!), 
But you would be parsing the input multiple times.

> and you can
> convert it into a dictionary pretty easily, too.
>   
Sure, but it is a different matter to get adequate keys.

I do know that computers are faster and memory is than before (and going 
to change again - core i7). However, I do try to code 'efficiently' so 
converting multiple data types does not fit when you can do it the 
desired way the first time.

Bruce



More information about the biology-in-python mailing list