[pygr-notify] Issue 41 in pygr: TBLASTN parsing error
codesite-noreply at google.com
codesite-noreply at google.com
Mon Sep 22 04:49:13 PDT 2008
Issue 41: TBLASTN parsing error
http://code.google.com/p/pygr/issues/detail?id=41
New issue report by fishfrogmonkey:
Bug in BLAST parser - from a git checkout on September 15th 2008
Background:
In order to get around a bug in tblastn the start position of the sequence
for the subject line is set to the start position of the query above it.
However if the first base of the query line is Q, then the find matches the
Q of Query: and it stores 0 as the offset of the start of the subject
sequence. See the last line of the result below plus traceback:
>ref|NC_007503.1| Carboxydothermus hydrogenoformans Z-2901, complete genome
Length = 2401520
Score = 90.5 bits (223), Expect = 4e-17, Method: Compositional matrix
adjust.
Identities = 53/181 (29%), Positives = 97/181 (53%), Gaps = 12/181 (6%)
Frame = +2
Query: 13 GKVLWQNLTFTISAGERVGIHAPSGTGKTTLGRVLAGWQKPTAGDVLLDGSPFPLHQYCP
72
G+V+ +TFT+ G+ +G+ PSG GK++L R+L PT+G++ G + +Y P
Sbjct: 99509 GQVILDGITFTVEEGDFLGVLGPSGAGKSSLFRLLNRLLSPTSGEIYYRGK--NIKEYDP
99682
Query: 73 VQLVPQHPELTFNPWRSAGDAVRD--------AWQPDPETLRRL----HVQPEWLTRRPM
120
++L + + P+ + D +PD E + + +++ E L ++P
Sbjct: 99683 IKLRREIGYVLQRPYLFGQKVLEDLTYPFRIRQEKPDMELIYKYLAQANLKEEILAKKPT
99862
Query: 121 QLSGGELARIAILRALDPRTRFLIADEMTAQLDPSIQKAIWVYVLEVCRSRSLGMLVISH
180
+LSGGE RI+++R L + R L+ DE+T+ LD +AI +L+ ++L +L I+H
Sbjct: 99863 ELSGGEAQRISLIRTLLVQPRVLLLDEVTSALDLDTTRAILDLILKEKEEKNLTVLAITH
100042
Query: 181 Q 181
Sbjct: 100043N 100045
Traceback (most recent call last):
File "/usr/lib/python2.5/site-packages/pygr/parse_blast.py", line 174, in
<module>
for t in p.parse_file(sys.stdin):
File "/usr/lib/python2.5/site-packages/pygr/parse_blast.py", line 169, in
parse_file
self.save_subject_line(line)
File "/usr/lib/python2.5/site-packages/pygr/parse_blast.py", line 80, in
save_subject_line
self.subject_end=int(c[3])
ValueError: invalid literal for int() with base 10: '100043N'
Possible Bugfix/workaround:
Line 70 in parse_blast.py
currently:
self.seq_start_char=line.find(c[2]) # IN CASE BLAST SCREWS UP
Sbjct:
could be:
self.seq_start_char=line[1:].find(c[2])+1 # IN CASE BLAST SCREWS UP
Sbjct: - only search from second character to avoid matches against Q of
Query:
Issue attributes:
Status: New
Owner: ----
Labels: Type-Defect Priority-Medium
--
You received this message because you are listed in the owner
or CC fields of this issue, or because you starred this issue.
You may adjust your issue notification preferences at:
http://code.google.com/hosting/settings
More information about the pygr-notify
mailing list