[bip] Bioinformatics Programming Language Shootout, Python performance poopoo'd

Tue Feb 5 13:14:35 PST 2008

Hi,
Try using:
ftp://www.bioinformatics.org/pub/benchmark/

Got to chuckle at the 'apples and oranges' comparisons across platforms as well!

Bruce

On Feb 5, 2008 2:21 PM, Andrew Dalke <dalke at dalkescientific.com> wrote:
> On Feb 5, 2008, at 9:00 PM, Chris Lasher wrote:
> >  They claim the source code is
> > available at the link below, but it doesn't work for me.
> >
> > http://www.bioinformatics.org/benchmark/
>
> Worked for me just now.
>
> FTP server is at
>
> Here's "readFast.py", with comments by me
>
> import re  # not actually used
> import sys
>
> header=""
> seq=""
> list_seq=[]
>
> class Sequence(object):
>          # no need to have default arguments here
>         # Using "n" and "s" are *bad* parameter names
>          def __init__(self,n="",s=""):
>                 # Don't reverse the formal parameter list and assignment list
>                  self.seq=s
>                  self.name=n
>
> # Adding "U" would include "universal" newline support
> f=open(sys.argv[1], 'r')
>
> # should do "for line in f:"
> for line in f.readlines():
>
>          # What?  Did they even *test* this?  It doesn't do anything.
>         # Should be (there are 2 errors)
>         #   line = line.rstrip("\n")
>          line.rstrip('/n')
>
>         # if line[:1] == ">" is actually faster
>          if line.startswith(">"):
>
>                  # should be "if seq:"
>                 # This code ignores 0 length sequences!
>                  if len(seq)!=0:
>                          list_seq.append(Sequence(header,seq))
>                          seq=""
>                  header=line[0:21]
>          else:
>
>                 # Read the FAQ; this is an O(n**2) operation.
>                 # While recent Pythons have done work to make this
>                 # naive approach work, it's best to do the
>                 #   lines.append(line)
>                 #   "".join(lines)
>                  seq+=line
>
> # breaks for a file containing no sequences
> list_seq.append(Sequence(header,seq))
> f.close()
>
>
> This is another place where I get to fume over the abilities of the
> paper reviewers (as well as the authors).  Do you think they went to
> the code and verified that the snippets were idiomatically correct
> for the different languages?
>
>                                 Andrew
>                                 dalke at dalkescientific.com
>
>
>
>
> _______________________________________________
> biology-in-python mailing list - bip at lists.idyll.org.
>
> See http://bio.scipy.org/ for our Wiki.
>