[bip] parallel recipe and bio libraries.

Brent Pedersen bpederse at gmail.com
Sun Feb 3 10:38:15 PST 2008


hi, i saw on the bio.scipy wiki that there's a request for recipes
doing parallel computations. i have a simple and general script that i
use for blast jobs. it could easily be used for any "embarrassingly
parallel" job, as it uses parallelpython/pp to do the work.
the example is blasting all rice 10,000 mers split into files by
chromosome against all other chromosomes. it takes files that look
like: ricetenkmers_chr06.fasta and creates blast output files with
names like: ricetenkmers_chr06_vs_ricetenkmers_chr09.blast

the code and the conf file are pasted here:
http://rafb.net/p/byCIjK28.html

any help on making this code (to quote the wiki) not suck? is it bad
form to use commands module instead of subprocess?
would it be better to use mpiblast for this sort of thing?


also, on an unrelated note. when this list first started, there was
much discussion on the state of current libraries. i looked at
biopython, it does seem very large, and much of it hasnt been touched
for 6 years. corebio doesnt seem active, and other libraries look
good, but slightly less inviting with (L)GPL. my needs are extremely
simple: fast feature and sequence access, i.e. getting exons and
slicing out sequence. and some simple, quick visualization. currently,
we do this in-house in perl and mysql and gd, but i'd prefer to use
say a memmaped numpy array and matplotlib. the (lazy-web) question
being: what bioinformatics tools are out there that are taking
advantage of all the python goodness? is biopython still the answer?
should i being looking at pygr for this? i dont really need the n-way
alignments, only to quickly and easily store, retrieve features and
sequence.

thanks,
-brent



More information about the biology-in-python mailing list