[bip] Blog post on bioinformatics and Python
Bruce Southey
bsouthey at gmail.com
Thu Sep 18 11:28:12 PDT 2008
C. Titus Brown wrote:
> On Thu, Sep 18, 2008 at 10:55:05AM -0500, Bruce Southey wrote:
> -> >> Use of iterators and what/how to get specific information out of
> -> >> BioPython objects.
> -> >
> -> > Could you clarify these points please? Are you in favour of Biopython
> -> > using python iterators (e.g. via generator functions)? And what
> -> > Biopython objects in particular were you trying to extract data from?
> -> >
> -> Part of it is a lack of understanding but I have not bothered to go
> -> back. So what I say is probably wrong and out of date. I do not really
> -> understand Python iterators and generators as my knowledge is still
> -> mainly Python 2.0 and have not bothered that much with the new language
> -> features. For what I wanted using .next() really was not an option
> -> because I thought that I would need to get specific entries not proceed
> -> in ordered approach. Now I definitely need to access specific entries to
> -> match across files or databases. Today I looked at at the BioPython
> -> tutorial Chapter 4 and saw SeqIO.to_dict which would have helped in that
> -> regard.
>
> Why not:
>
> data = list(data)
>
> ? That will take any iterator/generator and turn it into a list.
>
(Ignoring the previous answer given)
Easy: It is a waste of effort converting one unknown object into a list
of unknown objects that may or may not be the same.
Slightly harder: need to preselect the data by different criteria (hit,
score, evalue, query name) - now would have to parse a list of unknowns...
> There's no real penalty for doing this (if you need a random access
> list, then you need to fully parse the file anyway!),
But you would be parsing the input multiple times.
> and you can
> convert it into a dictionary pretty easily, too.
>
Sure, but it is a different matter to get adequate keys.
I do know that computers are faster and memory is than before (and going
to change again - core i7). However, I do try to code 'efficiently' so
converting multiple data types does not fit when you can do it the
desired way the first time.
Bruce
More information about the biology-in-python
mailing list