[bip] Bioinformatics Programming Language Shootout, Python performance poopoo'd

Andrew Dalke dalke at dalkescientific.com
Fri Feb 8 10:26:21 PST 2008


On Feb 8, 2008, at 3:48 AM, Paulo Nuin wrote:
> http://www.ncbi.nlm.nih.gov/sites/entrez?db=nuccore&cmd=search&term= 
> %20Hantavirus%20segment%20L

Can anyone make the NJ codes run?

I pulled out 12 sequences from that link.

  GI: 33860560, 157058361, 156567221, 126695337, 124024712, 123967536,
      123965234, 78778385, 111434066, 23464594, 148361453, 55733703

The Python program gave me

   File "NJ.py", line 204, in <module>
     compute_disimilarity(N)
   File "NJ.py", line 157, in compute_disimilarity
     if(list_seq[i][k] == list_seq[j][k]):
IndexError: string index out of range

The Perl program NJ.pl took almost a minute to produce.

(gi|124024712|ref|NC_:0,(((((((gi|55733703|ref|NC_0:0,((gi|148361453| 
gb|EF58:0.000498480301364114,(gi|111434066|gb| 
DQ82:0.000493233730785394,gi|156567221|gb|EU00:0.000498380445055715): 
0.00194332141965669):0.000732614428260341,gi|157058361|gb|EF64:0): 
0.00177930420757772):0.00143296717474478,gi|23464594|ref| 
NC_0:0):-1.00143648699336,gi|123965234|ref|NC_:0):1.12299044463885,gi| 
33860560|ref|NC_0:0):1.13312404678835,gi|78778385|ref|NC_0:0): 
1.14160938351598,gi|123967536|ref|NC_:0):0.900240655295447,gi| 
126695337|ref|NC_:0.293136656759629):1.30376626963158):0


The C program core dumped.


I then chopped all of the sequences to be the same length; the first  
6530 bases.

The Python program gave me

( ( ( ( ( ( gi|23464594|ref|NC_0: 1.17366651661, ( gi|111434066|gb| 
DQ82: 0.148904182499, gi|156567221|gb|EU00: 0.151185475232):  
1.10169405164): 0.162975482734, ( gi|123965234|ref|NC_:  
0.783397347124, gi|126695337|ref|NC_: 0.908025243214):  
0.179207247629): 0.0571801650779, gi|33860560|ref|NC_0:  
0.970803605456): 0.0659893145874, gi|123967536|ref|NC_:  
1.00734624052): 0.0288793245842, ( gi|78778385|ref|NC_0:  
1.15020230813, gi|124024712|ref|NC_: 0.986748711856):  
0.0994181473971): 0, ( gi|55733703|ref|NC_0: 1.18567966452, ( gi| 
148361453|gb|EF58: 0.163944100589, gi|157058361|gb|EF64:  
0.346254437969): 0.984616394381): 0.154232726999): 0

The Perl program gave me

(((((((gi|148361453|gb|EF58:0,(((gi|123965234|ref|NC_: 
0.988306732076271,gi|33860560|ref|NC_0:0.865560244439941): 
1.01694227538299,gi|123967536|ref|NC_:0):1.23242695730548,gi| 
124024712|ref|NC_:0):1.26867637791678):0.320615925328267,gi|157058361| 
gb|EF64:0):1.53542872692372,gi|126695337|ref|NC_:0.487018431249897): 
1.06214589526658,gi|23464594|ref|NC_0:0):1.14663660251452,(gi| 
111434066|gb|DQ82:0.0762888889451898,gi|156567221|gb| 
EU00:0.223800768786119):0):0.984861244847282,gi|78778385|ref| 
NC_0:0.515925217159195):0,gi|55733703|ref|NC_0:1.24796822390579):0

I assumed they should give identical results.  As you can see, they  
not only aren't byte identical, they don't even give the same numbers.

The C program still seg faults.


				Andrew
				dalke at dalkescientific.com





More information about the biology-in-python mailing list