[bip] parallel recipe and bio libraries.

Erich Schwarz emsch at its.caltech.edu
Tue Feb 5 12:51:45 PST 2008


On Tue, 5 Feb 2008, Paul Davis wrote:

> The issue at hand is the 'effective' database length. Effective
> database length is dependant on the query sequence as well.

    But if that's true, and if one wants to enforce an unchanging
product of these two values, then why not then use the argument

        "-Y [real number]"

for NCBI's blastall or blastpgp, which explicitly sets the size of
the database multiplied by the size of the query?

    WU-BLAST has no direct equivalent of this, but one could jointly
set values for the query sequence size with "Y=[number]" and the
database size with "Z=[number]", which would then implicitly
determine their product.


> I get the following values:
> [...]
> gi|15604718|ref|NP_219502.1|    gi|15834888|ref|NP_296647.1|    48.33
>  629     263     10      1       591     10      614     3e-145   519
> [...]
> gi|15604718|ref|NP_219502.1|    gi|15834888|ref|NP_296647.1|    48.33
>  629     263     10      1       591     10      614     2e-145   519

    But, logarithmically, these E-values are

        10^[ -144.52 ] vs.
        10^[ -144.70 ]

    I don't know whether the errors will always be this small, but
if they were, one might argue that they were practically
inconsequential.  Any sequence analysis whose results would topple
if an E-value's log10 went from -144.52 to -144.70 was probably
pretty tendentious anyway!


--Erich




More information about the biology-in-python mailing list