[bip] Bioinformatics Programming Language Shootout, Python performance poopoo'd
Andrew Dalke
dalke at dalkescientific.com
Tue Feb 5 12:21:55 PST 2008
On Feb 5, 2008, at 9:00 PM, Chris Lasher wrote:
> They claim the source code is
> available at the link below, but it doesn't work for me.
>
> http://www.bioinformatics.org/benchmark/
Worked for me just now.
FTP server is at
Here's "readFast.py", with comments by me
import re # not actually used
import sys
header=""
seq=""
list_seq=[]
class Sequence(object):
# no need to have default arguments here
# Using "n" and "s" are *bad* parameter names
def __init__(self,n="",s=""):
# Don't reverse the formal parameter list and assignment list
self.seq=s
self.name=n
# Adding "U" would include "universal" newline support
f=open(sys.argv[1], 'r')
# should do "for line in f:"
for line in f.readlines():
# What? Did they even *test* this? It doesn't do anything.
# Should be (there are 2 errors)
# line = line.rstrip("\n")
line.rstrip('/n')
# if line[:1] == ">" is actually faster
if line.startswith(">"):
# should be "if seq:"
# This code ignores 0 length sequences!
if len(seq)!=0:
list_seq.append(Sequence(header,seq))
seq=""
header=line[0:21]
else:
# Read the FAQ; this is an O(n**2) operation.
# While recent Pythons have done work to make this
# naive approach work, it's best to do the
# lines.append(line)
# "".join(lines)
seq+=line
# breaks for a file containing no sequences
list_seq.append(Sequence(header,seq))
f.close()
This is another place where I get to fume over the abilities of the
paper reviewers (as well as the authors). Do you think they went to
the code and verified that the snippets were idiomatically correct
for the different languages?
Andrew
dalke at dalkescientific.com
More information about the biology-in-python
mailing list