[bip] parallel recipe and bio libraries.

Sun Feb 3 18:11:02 PST 2008

Hi,
I have not really looked at the code in depth but Blast uses as many
cpus as told to. Also it handle multiple sequences in a single file
which, in theory, is meant to be more efficient. Also, disk IO is also
a limiting factor especially with SMP (dual processors/cores) so I
usually find there is no advantage in doing database formating in
parallel.

So what I am missing here?

Regards
Bruce

On Feb 3, 2008 12:38 PM, Brent Pedersen <bpederse at gmail.com> wrote:
> hi, i saw on the bio.scipy wiki that there's a request for recipes
> doing parallel computations. i have a simple and general script that i
> use for blast jobs. it could easily be used for any "embarrassingly
> parallel" job, as it uses parallelpython/pp to do the work.
> the example is blasting all rice 10,000 mers split into files by
> chromosome against all other chromosomes. it takes files that look
> like: ricetenkmers_chr06.fasta and creates blast output files with
> names like: ricetenkmers_chr06_vs_ricetenkmers_chr09.blast
>
> the code and the conf file are pasted here:
> http://rafb.net/p/byCIjK28.html
>
> any help on making this code (to quote the wiki) not suck? is it bad
> form to use commands module instead of subprocess?
> would it be better to use mpiblast for this sort of thing?
>
>
> also, on an unrelated note. when this list first started, there was
> much discussion on the state of current libraries. i looked at
> biopython, it does seem very large, and much of it hasnt been touched
> for 6 years. corebio doesnt seem active, and other libraries look
> good, but slightly less inviting with (L)GPL. my needs are extremely
> simple: fast feature and sequence access, i.e. getting exons and
> slicing out sequence. and some simple, quick visualization. currently,
> we do this in-house in perl and mysql and gd, but i'd prefer to use
> say a memmaped numpy array and matplotlib. the (lazy-web) question
> being: what bioinformatics tools are out there that are taking
> advantage of all the python goodness? is biopython still the answer?
> should i being looking at pygr for this? i dont really need the n-way
> alignments, only to quickly and easily store, retrieve features and
> sequence.
>
> thanks,
> -brent
>
> _______________________________________________
> biology-in-python mailing list - bip at lists.idyll.org.
>
> See http://bio.scipy.org/ for our Wiki.
>