[bip] mRNA lengths of all Human/Mouse refseqs (tutorial)

Thu Jan 17 16:07:47 PST 2008

On Jan 17, 2008, at 11:26 PM, Brent Pedersen wrote:
> not to mention the absurdity of trading 6 letters "[::-1]" for 10+
> lines in an external compiled library.

Sometimes knowing that the tradeoff exists and how to do it means  
that when there's a real problem you know a solution.

I've elsewhere talked about "practice pieces" - small things that can  
be done in a few minutes to an hour to get a feel for how something  
works.  For example, I had not used Pyrex before this, so this was a  
nice way for me to finally try it out.

Of course the next step is to see what Python does.  If this is  
important enough then perhaps that's something which can go into the  
runtime.  The code is in stringobject.c, and after the special cases  
for 0 length slices, and step size == 1 is this bit of code:

                 else {
                         source_buf = PyString_AsString((PyObject*) 
self);
                         result_buf = (char *)PyMem_Malloc(slicelength);
                         if (result_buf == NULL)
                                 return PyErr_NoMemory();

                         for (cur = start, i = 0; i < slicelength;
                              cur += step, i++) {
                                 result_buf[i] = source_buf[cur];
                         }

                         result = PyString_FromStringAndSize(result_buf,

slicelength);

Potentially this could be made faster for the special of step == -1,  
but I don't think by much.  For example, "cur += step" is probably  
slower than "cur -= 1", so a special purpose branch might make that  
faster at the expense of one extra test for all other cases.

Another possibility is to memcpy from source_buf to result_buf then  
do an in-place reversal in result_buf.  This would be faster if the  
memcpy is significantly faster than this simple byte-oriented version  
and memory bandwidth wasn't a problem.

See an example of a fast memcpy at http://www.vik.cc/daniel/portfolio/ 
memcpy.htm and there are similar cases elsewhere.

The downside of this approach is there are 3*N/2 writes and 3*N/2  
reads, instead of the expected N and N.

Something to bear in mind is that memory is slow.  Each fetch is  
something like 30 or 40 clock cycles, and an instruction takes only a  
small number of cycles.  It's sometimes better to do more work than  
to hit memory.

Consider this from the implementation of Python's string.lower

         for (i = 0; i < n; i++) {
                 int c = Py_CHARMASK(s[i]);
                 if (isupper(c))
                         s[i] = _tolower(c);
         }

and compare it to my naive optimization which assumes memory is cheap

         for (i = 0; i < n; i++) {
                 int c = Py_CHARMASK(s[i]);
                 s[i] = _tolower(c);
         }

You would think that doing less work (not calling isupper) would mean  
better performance.  But because this is memory bound, doing fewer  
writes make the code measurably faster.

So, going back to the [::-1] case, the fastest solution would be to  
implement something with only N reads and N writes.  That could be  
done along the lines of how memcpy works, but there's a disadvantage  
that the strings are likely not word aligned, so the usual  
optimization (copying via ints instead of chars) is complicated, and  
I'm not convinced it would be that much faster.

Still, there are some existing implementations you can try.  There's  
std::reverse_copy in C++ which might have optimizations for this.  I  
looked for other implementations for a "copy reverse string"  
algorithm but didn't find any.  The likely names of revcopy and  
variants are either the byte-oriented implementation or they do the  
normal copy but starting from the end of string instead of the  
beginning.

Because it's complicated, it's hard to justify adding to Python's  
core unless there was a significant improvement.  And in biology most  
cases for doing reverse are reverse-complement, the high-performance  
code would also do the conversion via a translation table.  And  
*that*'s not needed in code Python.

				Andrew
				dalke at dalkescientific.com