[pygr-notify] Issue 41 in pygr: TBLASTN parsing error

codesite-noreply at google.com codesite-noreply at google.com
Mon Sep 22 04:49:13 PDT 2008


Issue 41: TBLASTN parsing error
http://code.google.com/p/pygr/issues/detail?id=41

New issue report by fishfrogmonkey:
Bug in BLAST parser - from a git checkout on September 15th 2008

Background:

In order to get around a bug in tblastn the start position of the sequence
for the subject line is set to the start position of the query above it.

However if the first base of the query line is Q, then the find matches the
Q of Query: and it stores 0 as the offset of the start of the subject
sequence. See the last line of the result below plus traceback:

>ref|NC_007503.1| Carboxydothermus hydrogenoformans Z-2901, complete genome
           Length = 2401520

  Score = 90.5 bits (223), Expect = 4e-17,   Method: Compositional matrix
adjust.
  Identities = 53/181 (29%), Positives = 97/181 (53%), Gaps = 12/181 (6%)
  Frame = +2

Query: 13    GKVLWQNLTFTISAGERVGIHAPSGTGKTTLGRVLAGWQKPTAGDVLLDGSPFPLHQYCP
72
              G+V+   +TFT+  G+ +G+  PSG GK++L R+L     PT+G++   G    + +Y P
Sbjct: 99509 GQVILDGITFTVEEGDFLGVLGPSGAGKSSLFRLLNRLLSPTSGEIYYRGK--NIKEYDP
99682

Query: 73    VQLVPQHPELTFNPWRSAGDAVRD--------AWQPDPETLRRL----HVQPEWLTRRPM
120
              ++L  +   +   P+      + D          +PD E + +     +++ E L ++P
Sbjct: 99683 IKLRREIGYVLQRPYLFGQKVLEDLTYPFRIRQEKPDMELIYKYLAQANLKEEILAKKPT
99862

Query: 121   QLSGGELARIAILRALDPRTRFLIADEMTAQLDPSIQKAIWVYVLEVCRSRSLGMLVISH
180
              +LSGGE  RI+++R L  + R L+ DE+T+ LD    +AI   +L+    ++L +L I+H
Sbjct: 99863 ELSGGEAQRISLIRTLLVQPRVLLLDEVTSALDLDTTRAILDLILKEKEEKNLTVLAITH
100042

Query: 181   Q 181

Sbjct: 100043N 100045

Traceback (most recent call last):
   File "/usr/lib/python2.5/site-packages/pygr/parse_blast.py", line 174, in
<module>
     for t in p.parse_file(sys.stdin):
   File "/usr/lib/python2.5/site-packages/pygr/parse_blast.py", line 169, in
parse_file
     self.save_subject_line(line)
   File "/usr/lib/python2.5/site-packages/pygr/parse_blast.py", line 80, in
save_subject_line
     self.subject_end=int(c[3])
ValueError: invalid literal for int() with base 10: '100043N'


Possible Bugfix/workaround:

Line 70 in parse_blast.py

currently:
         self.seq_start_char=line.find(c[2]) # IN CASE BLAST SCREWS UP
Sbjct:

could be:
         self.seq_start_char=line[1:].find(c[2])+1 # IN CASE BLAST SCREWS UP
Sbjct: - only search from second character to avoid matches against Q of
Query:


Issue attributes:
	Status: New
	Owner: ----
	Labels: Type-Defect Priority-Medium

-- 
You received this message because you are listed in the owner
or CC fields of this issue, or because you starred this issue.
You may adjust your issue notification preferences at:
http://code.google.com/hosting/settings



More information about the pygr-notify mailing list