[bip] mRNA lengths of all Human/Mouse refseqs (tutorial)
Andrew Dalke
dalke at dalkescientific.com
Thu Jan 17 16:07:47 PST 2008
On Jan 17, 2008, at 11:26 PM, Brent Pedersen wrote:
> not to mention the absurdity of trading 6 letters "[::-1]" for 10+
> lines in an external compiled library.
Sometimes knowing that the tradeoff exists and how to do it means
that when there's a real problem you know a solution.
I've elsewhere talked about "practice pieces" - small things that can
be done in a few minutes to an hour to get a feel for how something
works. For example, I had not used Pyrex before this, so this was a
nice way for me to finally try it out.
Of course the next step is to see what Python does. If this is
important enough then perhaps that's something which can go into the
runtime. The code is in stringobject.c, and after the special cases
for 0 length slices, and step size == 1 is this bit of code:
else {
source_buf = PyString_AsString((PyObject*)
self);
result_buf = (char *)PyMem_Malloc(slicelength);
if (result_buf == NULL)
return PyErr_NoMemory();
for (cur = start, i = 0; i < slicelength;
cur += step, i++) {
result_buf[i] = source_buf[cur];
}
result = PyString_FromStringAndSize(result_buf,
slicelength);
Potentially this could be made faster for the special of step == -1,
but I don't think by much. For example, "cur += step" is probably
slower than "cur -= 1", so a special purpose branch might make that
faster at the expense of one extra test for all other cases.
Another possibility is to memcpy from source_buf to result_buf then
do an in-place reversal in result_buf. This would be faster if the
memcpy is significantly faster than this simple byte-oriented version
and memory bandwidth wasn't a problem.
See an example of a fast memcpy at http://www.vik.cc/daniel/portfolio/
memcpy.htm and there are similar cases elsewhere.
The downside of this approach is there are 3*N/2 writes and 3*N/2
reads, instead of the expected N and N.
Something to bear in mind is that memory is slow. Each fetch is
something like 30 or 40 clock cycles, and an instruction takes only a
small number of cycles. It's sometimes better to do more work than
to hit memory.
Consider this from the implementation of Python's string.lower
for (i = 0; i < n; i++) {
int c = Py_CHARMASK(s[i]);
if (isupper(c))
s[i] = _tolower(c);
}
and compare it to my naive optimization which assumes memory is cheap
for (i = 0; i < n; i++) {
int c = Py_CHARMASK(s[i]);
s[i] = _tolower(c);
}
You would think that doing less work (not calling isupper) would mean
better performance. But because this is memory bound, doing fewer
writes make the code measurably faster.
So, going back to the [::-1] case, the fastest solution would be to
implement something with only N reads and N writes. That could be
done along the lines of how memcpy works, but there's a disadvantage
that the strings are likely not word aligned, so the usual
optimization (copying via ints instead of chars) is complicated, and
I'm not convinced it would be that much faster.
Still, there are some existing implementations you can try. There's
std::reverse_copy in C++ which might have optimizations for this. I
looked for other implementations for a "copy reverse string"
algorithm but didn't find any. The likely names of revcopy and
variants are either the byte-oriented implementation or they do the
normal copy but starting from the end of string instead of the
beginning.
Because it's complicated, it's hard to justify adding to Python's
core unless there was a significant improvement. And in biology most
cases for doing reverse are reverse-complement, the high-performance
code would also do the conversion via a translation table. And
*that*'s not needed in code Python.
Andrew
dalke at dalkescientific.com
More information about the biology-in-python
mailing list