[bip] mRNA lengths of all Human/Mouse refseqs (tutorial)

Thu Jan 17 13:58:57 PST 2008

On Jan 17, 2008, at 9:49 PM, Brent Pedersen wrote:
> cython big guns make it another 2x as
> fast as seq[::-1].
>
> cdef extern from "stdio.h":
>     cdef Py_ssize_t strlen(char *)
>
> def inplace_rev(char* seq):
>     cdef int l = strlen(seq)
>     cdef int i
>     for i from 0 <= i < l / 2 :
>         seq[i] , seq[l - i - 1] =  seq[l - i - 1],  seq[i]

I didn't know there was a spinoff from Pyrex - cool!

However, doesn't that break a fundamental assumption in Python that  
strings are immutable?  The following should cause first breakage  
then major breakage:

inplace_rev(string.ascii_letters)

import __builtin__
for k in dir(__builtin__):
   strrev.inplace_rev(k)

Because of the immutable constraint, the Python implementation is  
free to reuse existing strings, which is why the following happens  
(using my Pyrex implementation, below):

 >>> import strrev
 >>> s = "This"
 >>> t = "This"
 >>> strrev.inplace_rev(s)
 >>> t
'sihT'
 >>>

There's also a problem if the string contains a NUL character, since  
you're using the C definition of string length and not Python's.  You  
should be able to call len(seq) directly instead of using strlen.

Here's my version of your function.  I changed it slightly because I  
thought this was easier to inspect and verify that it works for even  
and odd lengths, and that it doesn't do an extra swap at the middle  
point of odd length strings.

def inplace_rev(char* seq):
     cdef int start, end
     start = 0
     end = len(seq)-1  # might not work if len(seq) > C's max signed int
     while start < end:
         seq[start] , seq[end] =  seq[end],  seq[start]
         start = start + 1
         end = end - 1

				Andrew
				dalke at dalkescientific.com