[bip] Bioinformatics Programming Language Shootout, Python performance poopoo'd
Andrew Dalke
dalke at dalkescientific.com
Tue Feb 5 21:19:27 PST 2008
On Feb 6, 2008, at 4:40 AM, Bruce Southey wrote:
> The concern that I have is that showing incorrect code or results is
> not usually a constructive approach. A far bigger concern for me is
> what is actually being compared.
I haven't gotten to the point of looking at the other codes, or
reading the paper in detail. At this point I'm reviewing the Python
code because I know it the best.
What I've found so far is that it wasn't written by someone who knows
Python well. There's a lot of code that is stylistically Perl or
Fortran. This affects both the line count and the performance
numbers. My version of "parse" takes 1/2 the number of non-comment/
non-blank lines as reported by the paper.
In the one case where I was able to generate numbers, changing a
"0.0" to a "0" gave a 10% performance improvement, and using psyco
gave a 94% speedup. The paper talks about using C extensions for
Perl and Java, but not for Python.
These to me are methodological errors dealing with the Python code.
I assume they are true for the other projects, more or less.
Looking a bit at the other code bases, I see what you mean. A
serious questions is, what's the point of the paper?
This benchmark provides a comparison of six commonly used
programming
languages under two different operating systems. The overall
comparison
shows that a developer should choose an appropriate language
carefully,
taking into account the performance expected and the library
availability for each language.
That's not a shocker. That's ... boring.
> I don't think that the same algorithm
> is being implemented in each language! For example, the readFasta.py
> involves a list of classes that I don't think is done in the other
> languages - anyhow it is also so inefficient that it is trivial to
> reduce its time by nearly half. Also, the code don't seem to have any
> error checking ability or test ability to assess if the code is even
> correct.
The output of the perl and python ones are the same, except for the
newline added because the author thought that python's "print" needed
a "\n" just like perl's does. The C output adds the text "Align1 "
and "Align2 " to the output, making it harder to diff.
Some additional questions to address :
- Should each of the implementations be idiomatically correct?
- Should an implementation trade off performance vs. readability?
(which is itself language independent; Python programmers don't
read regexs as much as Perl)
- Are 3rd party packages allowed? For example, Numeric for Python,
or ragel for making a parser for C. Or psyco.
- Should the programs be "normal" or "expert"? That is, if I were
to write a Python parser for performance, with access to 3rd party
packages, I could pull in mxTextTools and make a very fast system.
But it wouldn't be what most people would do and it wouldn't be
maintainable for most environments.
Or I could use mmap and probably get a decent boost that way. See
for example the discussion on "wide finder", like at
http://www.dalkescientific.com/writings/diary/archive/2007/10/07/
wide_finder.html
- Why is the Python version for Linux running 2.4.4 (a bug fix
released October 18, 2006) and not Python 2.5 (released September 19,
2006)? I ask this in part because I help add some string
optimizations to Python's core specifically to make some parsing
tasks go faster, and others added other optimizations.
The paper says "Only C# and Python appeared consistently faster in
every program on Windows." That's not because of Windows being
special. It's because the Windows tests used Python 2.5, which
included all those the optimizations.
> I found it more disturbing the lack of literature review including
> different papers (http://www.cis.udel.edu/~silber/470STUFF/article.pdf
> or http://page.mi.fu-berlin.de/prechelt/Biblio/jccpprtTR.pdf ) or
The latter Prechelt paper is [8] in the reference list. However,
it's a different topic. In those two the purpose was to see how
people with relatively equal skills on a given language, implement
solutions to the same problem, then evaluate the aggregate properties
of those solutions. There were multiple people implementing the
program for each language.
In this case there's one person implementing the same program in
different languages, and with different skill levels in each language.
Because the author of the programs was a coauthor of the paper,
there's a possibility of bias as well. This wasn't a blinded
comparison, and there was no incentive to improve every program for
better performance, less memory use or fewer LOC.
Andrew
dalke at dalkescientific.com
More information about the biology-in-python
mailing list