[bip] fast sequence searching algorithm... in python

Tue Sep 25 22:14:58 PDT 2007

On Tue, Sep 25, 2007 at 07:30:07PM -0700, Sagar Damle wrote:
-> Hi all,
->   I'm in need of a fast sequence matching algorithm for DNA/RNA/ 
-> protein sequences.  My query searches are relatively short (<100bp)  
-> and 'subject' sequence is on the order of 10kb.  At the moment, I'm  
-> using the span() function in python's regular expression module to  
-> return all matches:
-> 
-> matches  = [match.span() for match in re.finditer(re.escape(str 
-> (query)), str(subject), re.IGNORECASE )]
-> 
->   This basically returns all coordinates of my query against subject  
-> as a list of tuples (match start, match stop), but its somewhat  
-> sluggish for my needs.  Is there a way to do it faster within python?

Hey Sagar,

a few questions...

1. Do you need full regular expressions, or can you just use
string.find?

2. Do you want to allow mismatches?

3. Are you looking for fixed-length matches, or do you want gapped
matching?

I would probably just use BLAST for this myself, unless I only cared
about exact matches, in which case I'd use string.find.  If I needed a
precise number of mismatches I'd use motility.

cheers,
--titus