[bip] BLAST and FASTA performance

Bruce Southey bsouthey at gmail.com
Mon Jul 21 06:53:05 PDT 2008


A_user wrote:
> Everybody is speaking about algorithms. Not only the algorithms matters but also the speed of our programming language!
>
> For example I have recently discovered a program called DNA Baser ( http://www.dnabaser.com/ ) which seems to be compiled, as it is only 1.5MB is size ( can be downloaded without registration ). 
>
> They are using some very complex algorithm for sequence assembly and the accuracy of the results is amazing. This accuracy can’t be done using (fast) algorithms that are giving partial (poor) solutions. And still the program is incredible fast. 
> So, if we don’t use a compiled language, at least we should spend some time to optimize our code for speed, not only to look for new algorithms because I have seen so many poor written Perl/Python programs….
>
> Now, I decided to learn more about compilers. The dual code processor is already old on the market and the quadruple core will be here soon also. Compiled code can really take advantage of these new CPU features.
>
>
>
>       
>
> _______________________________________________
> biology-in-python mailing list - bip at lists.idyll.org.
>
> See http://bio.scipy.org/ for our Wiki. 
Current compilers do support many of the features and even gcc can now 
attempt to automatically generate parallel code (no guarantee it will be 
better or correct).  But Python and other dynamic languages may not 
implement fully all the features due to thing like multiplatform support 
and global locks.
 
You might want to first check out:
Cython (http://cython.org/) for C in Python
F2py (http://www.f2py.org/) for Fortran in Python - is/was part of NumPy

It is this is harder to suggest anything in terms of parallel computing 
because of the complexity involved. If it is embarrassingly parallel 
then you just need to use threads. You may be able to use high 
performance math libraries like MKL (Intel's Math Kernel Library: 
http://www.intel.com/cd/software/products/asmo-na/eng/307757.htm). 
Otherwise you will need to understand parallel computing and how to get 
it into your code.

Regards
Bruce
PS: Intel's quadcore processors were available in mid 2006.




More information about the biology-in-python mailing list