[bip] mRNA lengths of all Human/Mouse refseqs (tutorial)

James Taylor james at jamestaylor.org
Thu Jan 17 14:00:56 PST 2008


Now that is incredibly scary. Strings are immutable, the C api is  
very clear on this, and for good reasons. And yet this code appears  
to do exactly what it claims to do --

 >>> f = "foobar"
 >>> buffer(f)
<read-only buffer for 0x2affda05ae40, size -1, offset 0 at  
0x2affda05bed8>
 >>> rev.inplace_rev( f )
 >>> f
'raboof'
 >>> buffer(f)
<read-only buffer for 0x2affda05ae40, size -1, offset 0 at  
0x2affda05bf10>

Bad bad bad! Interned strings really make this a mess. For example:

 >>> f = "foobar"
 >>> g = "foobar"
 >>> f
'foobar'
 >>> g
'foobar'
 >>> id(f)
47278362832592
 >>> id(g)
47278362832592
 >>> buffer( f )
<read-only buffer for 0x2affda05aed0, size -1, offset 0 at  
0x2affda05bed8>
 >>> buffer( g )
<read-only buffer for 0x2affda05aed0, size -1, offset 0 at  
0x2affda05bf48>
 >>> rev.inplace_rev( f )
 >>> f
'raboof'
 >>> g
'raboof'

Yikes!

-- jt

On Jan 17, 2008, at 3:49 PM, Brent Pedersen wrote:

> since you did declare it a war. cython big guns make it another 2x as
> fast as seq[::-1].
>
> cdef extern from "stdio.h":
>     cdef Py_ssize_t strlen(char *)
>
> def inplace_rev(char* seq):
>     cdef int l = strlen(seq)
>     cdef int i
>     for i from 0 <= i < l / 2 :
>         seq[i] , seq[l - i - 1] =  seq[l - i - 1],  seq[i]




More information about the biology-in-python mailing list