[pygr-notify] Issue 44 in pygr: tblastn and blastx support
codesite-noreply at google.com
codesite-noreply at google.com
Wed Jan 7 13:50:12 PST 2009
Updates:
Cc: marecki
Labels: -Priority-Medium Priority-Low Milestone-Release0.8
Comment #3 on issue 44 by cjlee112: tblastn and blastx support
http://code.google.com/p/pygr/issues/detail?id=44
Titus requested some usage examples, so here goes:
# BASIC USAGE: creating a mapping object for any seqDB
# that you want to search using BLAST
blastmap = BlastMapping(seqDB)
# blastmap acts like a Pygr graph (i.e. dict)
# that takes any sequence object as a source node (key)
# and maps them to homologous sequence intervals from seqDB
# as target nodes, with edge information about each alignment,
# following the generic pattern blastmap[src][target] = edgeInfo.
# You can use it either in the mapping style:
# myquery can be any Pygr sequence interval object...
for src,target,edge in blastmap[myquery].edges():
# print the aligned intervals or whatever you want to do...
print '%s %s\n%s %s\n' %(src,repr(src),target,repr(target))
# or you use it as a callable, so you can pass
# search parameters like expmax etc.
# This also lets you save many search results into
# one NLMSA...
for myquery in lotsaQueries:
blastmap(myquery, al=myNLMSA)
myNLMSA.build()
Since this uses NLMSA to store the results, any of the NLMSASlice group-by
operators
can be applied as usual.
BlastMapping determines which flavor of BLAST (blastp or blastn) to run
based on the
sequence type (nucleotide vs. protein) of the query and database.
Currently, tblastn
and blastx won't work correctly, because of the translation issues outlined
above.
The new proposal would make tblastn work exactly the same as blastp and
blastn (i.e.
as in the example code above), but with one extra feature: the aligned
target
intervals (which look like protein sequence intervals, even though the
target
database is nucleotide) would have a "sequence" attribute which yields the
associated
nucleotide sequence interval from the target sequence database, e.g. to
print the
target *nucleotide sequence coordinates* next to the target *protein
sequence
string*, you would just change the code trivially:
for src,target,edge in blastmap[myquery].edges():
# print the aligned intervals or whatever you want to do...
print '%s %s\n%s %s %2.1f\n' %(src,repr(src),target,
repr(target.sequence),
100.*edge.pIdentity())
This example also prints the percent identity based of the *protein*
sequence of
target vs. src.
Explanation: the target sequence interval is not an interval of seqDB of
course,
because tblastn translates the subject nucleotide sequence into a protein
sequence.
target represents that protein sequence interval; you can slice it, extract
its
protein sequence string (by str()) etc. in all the usual ways. But it is
actually an
Annotation sequence object, bound to a real sequence object, namely its
associated
interval of nucleotide sequence from seqDB, which as always with an
annotation, you
can obtain by requesting the "sequence" attribute. This takes advantage of
the fact
that in Pygr an annotation is actually a subclass of sequence (SeqPath), so
all the
sequence operation methods come along for free...
--
You received this message because you are listed in the owner
or CC fields of this issue, or because you starred this issue.
You may adjust your issue notification preferences at:
http://code.google.com/hosting/settings
More information about the pygr-notify
mailing list