[bip] Bioinformatics Programming Language Shootout, Python performance poopoo'd
Andrew Dalke
dalke at dalkescientific.com
Tue Feb 5 19:48:27 PST 2008
On Feb 6, 2008, at 2:35 AM, Titus Brown wrote:
> Well, the mailing list archives are open and searchable, so I'm sure
> you're welcome to do so ;)
I've no problem.
> I suspect that the code could use some systematic code review; "we"
> (i.e. someone else :) could even write up something semi-formal if it
> turns out that the results are bogus.
It's hard to evaluate the BLAST parsing time without knowing how they
generated the data file. I can read a million lines a second while
they report
Python was the worst performer for parsing a BLAST file (Fig 3),
taking more than 38 minutes to process the file compared to Perl,
which took only 7.28 minutes. This difference did not arise from
any inability of Python to handle large files, since it took only
3.2 minutes to read the file without processing the lines. Perl
accomplished the same task in only 1.4 minutes.
Assuming the same disk speed,
1000000 line * 30 bytes / line = 30MB / sec
3.2 minutes * 60 sec/min * 30MB/sec = 5+ GB
As a rough guess that's 4GB or larger. Hmm, but I was reading from a
gzipped file. Still, it'll have to be a *huge* file to get that slow
performance.
What constitutes bogus enough? I think the results for Python are
bogus, the methodology is bogus, and two of the three benchmarks,
being without test data, are also bogus.
Andrew
dalke at dalkescientific.com
More information about the biology-in-python
mailing list