Hi,<br>As Titus points out, string methods could be used and these are often faster than regular expressions. <br>See the <a href="http://www.amk.ca/python/howto/regex/regex.html">Regular expression HowTo </a>, especially
<a href="http://www.amk.ca/python/howto/regex/regex.html#SECTION000710000000000000000">section 6</a>:<br><a href="http://www.amk.ca/python/howto/regex/regex.html#SECTION000710000000000000000">http://www.amk.ca/python/howto/regex/regex.html#SECTION000710000000000000000
</a><br><br>Also, if you are matching the same pattern multiple times, it is recommended recompiling the pattern (re.compile) for speed.<br><br>Bruce<br><br><br><div><span class="gmail_quote">On 9/26/07, <b class="gmail_sendername">
Titus Brown</b> <<a href="mailto:titus@caltech.edu">titus@caltech.edu</a>> wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
On Tue, Sep 25, 2007 at 07:30:07PM -0700, Sagar Damle wrote:<br>-> Hi all,<br>-> I'm in need of a fast sequence matching algorithm for DNA/RNA/<br>-> protein sequences. My query searches are relatively short (<100bp)
<br>-> and 'subject' sequence is on the order of 10kb. At the moment, I'm<br>-> using the span() function in python's regular expression module to<br>-> return all matches:<br>-><br>-> matches = [
match.span() for match in re.finditer(re.escape(str<br>-> (query)), str(subject), re.IGNORECASE )]<br>-><br>-> This basically returns all coordinates of my query against subject<br>-> as a list of tuples (match start, match stop), but its somewhat
<br>-> sluggish for my needs. Is there a way to do it faster within python?<br><br>Hey Sagar,<br><br>a few questions...<br><br>1. Do you need full regular expressions, or can you just use<br>string.find?<br><br>2. Do you want to allow mismatches?
<br><br>3. Are you looking for fixed-length matches, or do you want gapped<br>matching?<br><br>I would probably just use BLAST for this myself, unless I only cared<br>about exact matches, in which case I'd use string.find
. If I needed a<br>precise number of mismatches I'd use motility.<br><br>cheers,<br>--titus<br><br>_______________________________________________<br>biology-in-python mailing list<br><a href="mailto:biology-in-python@lists.idyll.org">
biology-in-python@lists.idyll.org</a><br><a href="http://lists.idyll.org/listinfo/biology-in-python">http://lists.idyll.org/listinfo/biology-in-python</a><br></blockquote></div><br>