[bip] BLAST and FASTA performance

Fri Jul 18 12:30:35 PDT 2008

This is curiosity as I'm preparing my presentation for EuroSciPy
and updating myself on some of the bioinformatics parts.  I was
wondering how newer processors, newer instruction sets, and GPUs
have affected similarity searches.  How long does it take to search
all of present-day GenBank compared to the equivalent search
5 years ago?  What's grown faster, database size or search
performance?

The rule-of-thumb I learned 10 years ago was to use BLAST
because while FASTA was a more pure algorithm, the performance
difference was huge, for little sensitivity gain.  Over the
years I see most people have chosen BLAST.  Why?  If it's
mostly because of the performance, where is the point where it
makes sense to switch back?

http://en.wikipedia.org/wiki/BLAST says
   "BLAST is about 50 times faster than the dynamic programming" of
the Smith-Waterman implementation in FASTA

under the section "Accelerated versions" it points to some FPGA-based  
versions
which are "up to 100x faster".

http://en.wikipedia.org/wiki/Smith-Waterman_algorithm says there are
implementations using the GPU which show "up to a 30 fold speed  
increase"
and implementations using SSE2 with "speed-ups of close to 200".

This implies that SSEARCH from FASTA could be several times
faster than BLAST (200 / 50 = 4).  Or does the Wikipedia factor
of 50 refer to the most optimized versions of these programs.
Should I add a "citation needed" request to that number?  :)

I didn't dig far for numbers.  I did find this comment from
http://www.hpcwire.com/offthewire/17883709.html
> publishing comparisons against named products is problematic as  
> most licenses specifically prohibit use of the product for  
> benchmarking against other products.
>

				Andrew
				dalke at dalkescientific.com