[bip] mRNA lengths of all Human/Mouse refseqs (tutorial)
Andrew Dalke
dalke at dalkescientific.com
Thu Jan 17 13:58:57 PST 2008
On Jan 17, 2008, at 9:49 PM, Brent Pedersen wrote:
> cython big guns make it another 2x as
> fast as seq[::-1].
>
> cdef extern from "stdio.h":
> cdef Py_ssize_t strlen(char *)
>
> def inplace_rev(char* seq):
> cdef int l = strlen(seq)
> cdef int i
> for i from 0 <= i < l / 2 :
> seq[i] , seq[l - i - 1] = seq[l - i - 1], seq[i]
I didn't know there was a spinoff from Pyrex - cool!
However, doesn't that break a fundamental assumption in Python that
strings are immutable? The following should cause first breakage
then major breakage:
inplace_rev(string.ascii_letters)
import __builtin__
for k in dir(__builtin__):
strrev.inplace_rev(k)
Because of the immutable constraint, the Python implementation is
free to reuse existing strings, which is why the following happens
(using my Pyrex implementation, below):
>>> import strrev
>>> s = "This"
>>> t = "This"
>>> strrev.inplace_rev(s)
>>> t
'sihT'
>>>
There's also a problem if the string contains a NUL character, since
you're using the C definition of string length and not Python's. You
should be able to call len(seq) directly instead of using strlen.
Here's my version of your function. I changed it slightly because I
thought this was easier to inspect and verify that it works for even
and odd lengths, and that it doesn't do an extra swap at the middle
point of odd length strings.
def inplace_rev(char* seq):
cdef int start, end
start = 0
end = len(seq)-1 # might not work if len(seq) > C's max signed int
while start < end:
seq[start] , seq[end] = seq[end], seq[start]
start = start + 1
end = end - 1
Andrew
dalke at dalkescientific.com
More information about the biology-in-python
mailing list