[bip] BLAST and FASTA performance
Andrew Dalke
dalke at dalkescientific.com
Fri Jul 18 12:30:35 PDT 2008
This is curiosity as I'm preparing my presentation for EuroSciPy
and updating myself on some of the bioinformatics parts. I was
wondering how newer processors, newer instruction sets, and GPUs
have affected similarity searches. How long does it take to search
all of present-day GenBank compared to the equivalent search
5 years ago? What's grown faster, database size or search
performance?
The rule-of-thumb I learned 10 years ago was to use BLAST
because while FASTA was a more pure algorithm, the performance
difference was huge, for little sensitivity gain. Over the
years I see most people have chosen BLAST. Why? If it's
mostly because of the performance, where is the point where it
makes sense to switch back?
http://en.wikipedia.org/wiki/BLAST says
"BLAST is about 50 times faster than the dynamic programming" of
the Smith-Waterman implementation in FASTA
under the section "Accelerated versions" it points to some FPGA-based
versions
which are "up to 100x faster".
http://en.wikipedia.org/wiki/Smith-Waterman_algorithm says there are
implementations using the GPU which show "up to a 30 fold speed
increase"
and implementations using SSE2 with "speed-ups of close to 200".
This implies that SSEARCH from FASTA could be several times
faster than BLAST (200 / 50 = 4). Or does the Wikipedia factor
of 50 refer to the most optimized versions of these programs.
Should I add a "citation needed" request to that number? :)
I didn't dig far for numbers. I did find this comment from
http://www.hpcwire.com/offthewire/17883709.html
> publishing comparisons against named products is problematic as
> most licenses specifically prohibit use of the product for
> benchmarking against other products.
>
Andrew
dalke at dalkescientific.com
More information about the biology-in-python
mailing list