[bip] Bioinformatics Programming Language Shootout, Python performance poopoo'd

Andrew Dalke dalke at dalkescientific.com
Tue Feb 5 19:26:33 PST 2008


On Feb 5, 2008, at 9:19 PM, Brent Pedersen wrote:
> my favorite so far is the max()  function in  [alignment.py]


For those who haven't seen it yet:

def max(a,b,c):
   if a > b:
                 if a > c:
                         return a
                 else:
                         return c
   else:
                 if b > c:
                         return b
                 else:
                         return c


which the built-in 'max' does just as well.

This is strange!  Using this max takes 20.0 seconds, using the built- 
in max takes 21.5 seconds.  It's faster to use this special max  
function.  I wonder if it's the extra lookup time to get max from the  
builtin namespace... yeah, seems that way.


alignment.py also does
         #F = zeros((n+1,m+1))
         F = [[0.0 for x in xrange(m+1)] for y in xrange(n+1)]

which means it doesn't use any of the numeric libraries in Python but  
they may have experimented with it.  Hmm, but using numarray.zeros  
makes things slower.  A lot slower.  I'm not sure why that's the case.

Replacing the "0.0" with "0" (using int math instead of float), gives  
a 10% performance boost and unchanged output.


I agree with Brent - this is a case where psycho would probably do  
well.  The kernel is

     for i in range(1,n+1):
         for j in range(1, m+1):
             F[i][j] = max(F[i-1][j-1]+score(seq1[i-1], seq2[j-1]),
                           F[i][j-1]+e, F[i-1][j]+e)

(replacing the range(1, m+1) with a constant gives negligible change).

Adding timing statements around the kernel, and changing the string  
output to use the MD5 checksum (easier to verify that things didn't  
change)


% time python alignment2.py
Start F
End F 17.1918239594
e22c305d849ca59bab4bb180e3511611
2690d2681eef47664120bb9fdc9fcca8
18.149u 0.397s 0:18.63 99.4%    0+0k 0+8io 0pf+0w

Then with the addition of

import psyco
psyco.full()


% time python alignment2.py
Start F
End F 0.862571001053
e22c305d849ca59bab4bb180e3511611
2690d2681eef47664120bb9fdc9fcca8
1.295u 0.366s 0:01.71 96.4%     0+0k 0+29io 0pf+0w

Nice!

Yes, 93% faster.  (BTW, to get this performance you *must* have the  
Python-defined version of max, and not use the builtin.  Otherwise  
the time is 5.1 sec instead of 1.7 sec)


According to the crappy figure 1 -- the scale makes it impossible to  
compare the C/C++/Java/C# numbers, and a table would be more helpful  
-- the Linux time would drop to about 1.7 seconds.  Given the text  
"60 fold slower" that means with psyco it should be about "4.5 fold  
slower".



What's the point of the "time.sleep(10000)" at the end?



The paper says:
> As shown previously in the global alignment example, Java and Perl  
> can communicate with a program written in C, speeding up the  
> program using JNI and XS respectively. For example, if a computer  
> intensive command based program written in C needs a graphical  
> interface, an easy solution would be to use the Swing library and  
> the JNI framework instead of rewriting the whole program in Java.



and omits that Python has the same ability to talk to C.




				Andrew
				dalke at dalkescientific.com





More information about the biology-in-python mailing list