[bip] BLAST and FASTA performance

Bruce Southey bsouthey at gmail.com
Sat Jul 19 11:22:12 PDT 2008


On Fri, Jul 18, 2008 at 2:30 PM, Andrew Dalke <dalke at dalkescientific.com> wrote:
> This is curiosity as I'm preparing my presentation for EuroSciPy
> and updating myself on some of the bioinformatics parts.  I was
> wondering how newer processors, newer instruction sets, and GPUs
> have affected similarity searches.  How long does it take to search
> all of present-day GenBank compared to the equivalent search
> 5 years ago?  What's grown faster, database size or search
> performance?

As I recall, the basic algorithm is linear in the size of the
database. But the score statistics (e-value) do change with database
size (and query size). So really the database size is irrelevant from
the performance aspect. The gains are more on how to handle the
database better such as keeping the BLAST database in memory instead
of restarting new every time (I think I showed that in some thread).

BLAST is essentially a embarrassingly parallel problem (at least until
you have more processors than sequences in the database).
Consequently, more processors will be faster but communication and
consolidating the results will slow it down. This means expert
programming on the parallel size and performance is essential for fast
BLAST - hence the commercial packages.

While I don't know much about using GPUs, I know NVIDIA has it's CUDA.
I did read a comment that precision may be a problem as performance is
more important than accuracy - some GPU's don't even support double
precision. Remember that most processors are focused for general
computing tasks so there is a sacrifice involved because space is
limited on the processor die. This means special chips or high
numerical performance processors  (like cell processors) will be
faster for that task but perhaps not others.

The use of different instruction sets is probably somewhat limited
because the you need a compiler to support them and a related
application that can use them. So far these have been orientated to
multimedia applications and can make a significant difference if
supported or not.

There has been advantages from moving from 32-bit processors to 64-bit
processors and from the shrinking processor die size are the most
obvious ones. Intel also has introduced a new division module with the
Penyrn chips. But I would redirect the processor question to say
something like the charts at Tom's hardware.com
(http://www.tomshardware.com/charts/processors/xvid-1-1-2,403.html)

For example, the  Deep Fritz 10 (Kilo nodes per second) benchmark has
the current Intel Quadcore Qx9650 (3.0GHz) at 14,784 while the Pentium
4 530 3.0GHz (released June 2004) is at 1,042. This is a computer
chess program for optimized for multiprocessors.

For the most part, computers have not only got faster and cheaper with
more memory but also have multi-threading ability (hyperthreads, dual
cores etc.) so you can easily argue a 2 fold increase by having a quad
core versus a dual core powered computer for about the same price
today.

Regards
Bruce



More information about the biology-in-python mailing list