[bip] fast sequence searching algorithm... in python

Bruce Southey bsouthey at gmail.com
Wed Sep 26 06:31:00 PDT 2007


Hi,
As Titus points out, string methods could be used and these are often faster
than regular expressions.
See the Regular expression HowTo
<http://www.amk.ca/python/howto/regex/regex.html>, especially section
6<http://www.amk.ca/python/howto/regex/regex.html#SECTION000710000000000000000>
:
http://www.amk.ca/python/howto/regex/regex.html#SECTION000710000000000000000

Also, if you are matching the same pattern multiple times, it is recommended
recompiling the pattern (re.compile) for speed.

Bruce


On 9/26/07, Titus Brown <titus at caltech.edu> wrote:
>
> On Tue, Sep 25, 2007 at 07:30:07PM -0700, Sagar Damle wrote:
> -> Hi all,
> ->   I'm in need of a fast sequence matching algorithm for DNA/RNA/
> -> protein sequences.  My query searches are relatively short (<100bp)
> -> and 'subject' sequence is on the order of 10kb.  At the moment, I'm
> -> using the span() function in python's regular expression module to
> -> return all matches:
> ->
> -> matches  = [match.span() for match in re.finditer(re.escape(str
> -> (query)), str(subject), re.IGNORECASE )]
> ->
> ->   This basically returns all coordinates of my query against subject
> -> as a list of tuples (match start, match stop), but its somewhat
> -> sluggish for my needs.  Is there a way to do it faster within python?
>
> Hey Sagar,
>
> a few questions...
>
> 1. Do you need full regular expressions, or can you just use
> string.find?
>
> 2. Do you want to allow mismatches?
>
> 3. Are you looking for fixed-length matches, or do you want gapped
> matching?
>
> I would probably just use BLAST for this myself, unless I only cared
> about exact matches, in which case I'd use string.find.  If I needed a
> precise number of mismatches I'd use motility.
>
> cheers,
> --titus
>
> _______________________________________________
> biology-in-python mailing list
> biology-in-python at lists.idyll.org
> http://lists.idyll.org/listinfo/biology-in-python
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.idyll.org/pipermail/biology-in-python/attachments/20070926/389b51b6/attachment-0001.htm 


More information about the biology-in-python mailing list