[bip] BLAST and FASTA performance

Sat Jul 19 12:20:50 PDT 2008

On Sat, Jul 19, 2008 at 01:22:12PM -0500, Bruce Southey wrote:
-> On Fri, Jul 18, 2008 at 2:30 PM, Andrew Dalke <dalke at dalkescientific.com> wrote:
-> > This is curiosity as I'm preparing my presentation for EuroSciPy
-> > and updating myself on some of the bioinformatics parts.  I was
-> > wondering how newer processors, newer instruction sets, and GPUs
-> > have affected similarity searches.  How long does it take to search
-> > all of present-day GenBank compared to the equivalent search
-> > 5 years ago?  What's grown faster, database size or search
-> > performance?
-> 
-> As I recall, the basic algorithm is linear in the size of the
-> database. But the score statistics (e-value) do change with database
-> size (and query size). So really the database size is irrelevant from
-> the performance aspect. The gains are more on how to handle the
-> database better such as keeping the BLAST database in memory instead
-> of restarting new every time (I think I showed that in some thread).

Are you sure?  The algorithm itself shouldn't be linear in the size of
the database, although some of the preparatory steps (constructing the
hash table of words) are, of course.

--titus