[bip] Bioinformatics Programming Language Shootout, Python performance poopoo'd
Andrew Dalke
dalke at dalkescientific.com
Tue Feb 5 19:26:33 PST 2008
On Feb 5, 2008, at 9:19 PM, Brent Pedersen wrote:
> my favorite so far is the max() function in [alignment.py]
For those who haven't seen it yet:
def max(a,b,c):
if a > b:
if a > c:
return a
else:
return c
else:
if b > c:
return b
else:
return c
which the built-in 'max' does just as well.
This is strange! Using this max takes 20.0 seconds, using the built-
in max takes 21.5 seconds. It's faster to use this special max
function. I wonder if it's the extra lookup time to get max from the
builtin namespace... yeah, seems that way.
alignment.py also does
#F = zeros((n+1,m+1))
F = [[0.0 for x in xrange(m+1)] for y in xrange(n+1)]
which means it doesn't use any of the numeric libraries in Python but
they may have experimented with it. Hmm, but using numarray.zeros
makes things slower. A lot slower. I'm not sure why that's the case.
Replacing the "0.0" with "0" (using int math instead of float), gives
a 10% performance boost and unchanged output.
I agree with Brent - this is a case where psycho would probably do
well. The kernel is
for i in range(1,n+1):
for j in range(1, m+1):
F[i][j] = max(F[i-1][j-1]+score(seq1[i-1], seq2[j-1]),
F[i][j-1]+e, F[i-1][j]+e)
(replacing the range(1, m+1) with a constant gives negligible change).
Adding timing statements around the kernel, and changing the string
output to use the MD5 checksum (easier to verify that things didn't
change)
% time python alignment2.py
Start F
End F 17.1918239594
e22c305d849ca59bab4bb180e3511611
2690d2681eef47664120bb9fdc9fcca8
18.149u 0.397s 0:18.63 99.4% 0+0k 0+8io 0pf+0w
Then with the addition of
import psyco
psyco.full()
% time python alignment2.py
Start F
End F 0.862571001053
e22c305d849ca59bab4bb180e3511611
2690d2681eef47664120bb9fdc9fcca8
1.295u 0.366s 0:01.71 96.4% 0+0k 0+29io 0pf+0w
Nice!
Yes, 93% faster. (BTW, to get this performance you *must* have the
Python-defined version of max, and not use the builtin. Otherwise
the time is 5.1 sec instead of 1.7 sec)
According to the crappy figure 1 -- the scale makes it impossible to
compare the C/C++/Java/C# numbers, and a table would be more helpful
-- the Linux time would drop to about 1.7 seconds. Given the text
"60 fold slower" that means with psyco it should be about "4.5 fold
slower".
What's the point of the "time.sleep(10000)" at the end?
The paper says:
> As shown previously in the global alignment example, Java and Perl
> can communicate with a program written in C, speeding up the
> program using JNI and XS respectively. For example, if a computer
> intensive command based program written in C needs a graphical
> interface, an easy solution would be to use the Swing library and
> the JNI framework instead of rewriting the whole program in Java.
and omits that Python has the same ability to talk to C.
Andrew
dalke at dalkescientific.com
More information about the biology-in-python
mailing list