[bip] fast sequence searching algorithm... in python
Titus Brown
titus at caltech.edu
Tue Sep 25 22:14:58 PDT 2007
On Tue, Sep 25, 2007 at 07:30:07PM -0700, Sagar Damle wrote:
-> Hi all,
-> I'm in need of a fast sequence matching algorithm for DNA/RNA/
-> protein sequences. My query searches are relatively short (<100bp)
-> and 'subject' sequence is on the order of 10kb. At the moment, I'm
-> using the span() function in python's regular expression module to
-> return all matches:
->
-> matches = [match.span() for match in re.finditer(re.escape(str
-> (query)), str(subject), re.IGNORECASE )]
->
-> This basically returns all coordinates of my query against subject
-> as a list of tuples (match start, match stop), but its somewhat
-> sluggish for my needs. Is there a way to do it faster within python?
Hey Sagar,
a few questions...
1. Do you need full regular expressions, or can you just use
string.find?
2. Do you want to allow mismatches?
3. Are you looking for fixed-length matches, or do you want gapped
matching?
I would probably just use BLAST for this myself, unless I only cared
about exact matches, in which case I'd use string.find. If I needed a
precise number of mismatches I'd use motility.
cheers,
--titus
More information about the biology-in-python
mailing list