[bip] Bioinformatics Programming Language Shootout, Python performance poopoo'd
Andrew Dalke
dalke at dalkescientific.com
Thu Feb 7 17:49:02 PST 2008
On Feb 7, 2008, at 10:30 PM, Paulo Nuin wrote:
> I am trying to do a blastp with the sequence ID from the paper and
> I am not getting a 9.8 Gb file. Not even 9.8 Mb. I have tested a
> couple of methods, even using Geneious to to the blast. Anyone else
> tried to obtain this file?
> I wanted to redo the "benchmarks" with some code modifications.
Same here. I'm trying to get a test set for the RT code. All I have
to go on is
In the test example 76 Hantavirus segment L sequences were used
with an
overall alignment length of 6580 nucleotides.
Any idea of which sequences?
Has anyone figured out how the LOC counts were generated? This gives
a the right answer of 119 for alignment.c
egrep -v '^ *\*' alignment.c | perl -pe 's/\t/ /g' | egrep -v '^ *$'
| egrep -v '^ *//' | egrep -v '^ */\*' | egrep -v '^ *\}' | wc -l
Broken down that's
egrep -v '^ *\*' alignment.c | # remove comment continuations
perl -pe 's/\t/ /g' | # replace tabs with spaces
egrep -v '^ *$' | # remove blank lines
egrep -v '^ *//' | # remove C++-style comments (not
legal in C-90)
egrep -v '^/\*' | # remove lines which start a comment
egrep -v '^ *\}' | # remove lines containing only a "}"
wc -l
However, there's dead code in that module.
char* insert(char* str, char car){
int l=strlen(str);
str=(char*)malloc(sizeof(char) * l+1);
str[l]=car;
return str;
}
is never referenced.
If I use the same filter to get the line count for NJ.c and reader.c
I get 175+77 = 252 while they quote 240. For "parser.c" and
"parseRE.c" I get 81, instead of 82.
And it should be shorter. Given
size_line=strlen(line);
line[size_line-1] = '\0';
size_line--;
if(line[0] == '>'){
// don't copy >
memcpy(name, line+1, size_line-1);
there's two places where the right side character is chopped off.
There should be only one. That's a bug. Anyone surprised? Though
more seriously since there's no way to tell what it means to be right
there's no reason to have some of this code at all.
Andrew
dalke at dalkescientific.com
More information about the biology-in-python
mailing list