[pygr-notify] Issue 76 in pygr: ensembl.SeqRegion gives KeyError if missing coord system or bad seqID

codesite-noreply at google.com codesite-noreply at google.com
Fri Mar 27 10:24:57 PDT 2009


Updates:
	Summary: ensembl.SeqRegion gives KeyError if missing coord system or bad  
seqID

Comment #8 on issue 76 by cjlee112: ensembl.SeqRegion gives KeyError if  
missing coord system or bad seqID
http://code.google.com/p/pygr/issues/detail?id=76

Hi Jenny,
I'm sorry this is such a frustrating process.  The reason you're still  
getting this
error message is that the table you added as coord system 15 does *not*  
contain this
sequence ID -- so pygr correctly gives you the same error messsage.  Your  
issue76.py
coordSystems does not actually provide the 5 coord systems required by  
ensembl, but
only two.  It treats coord systems 4, 15 and 11 as if they were identical  
in ensembl,
which they are not (and similarly for coord systems 17 and 101).   
Presumably ensembl
would not have created 4, 15, and 11 as separate coordinate systems if they  
were
actually identical.

FYI, here is how I analyzed this error message to see exactly what's going  
on:
leec$ python -i issue76.py
Traceback (most recent call last):
   File "issue76.py", line 36, in <module>
     orientation='seq_region_strand'))
   File "/Users/leec/projects/pygr/pygr/annotation.py", line 137, in __init__
     (repr(seqDB),))
KeyError: ' cannot create annotation object; sequence database {} may not  
be correct'
>>> import pdb
>>> pdb.pm()
> /Users/leec/projects/pygr/pygr/annotation.py(137)__init__()
-> (repr(seqDB),))
(Pdb) k
292783L
(Pdb) self.sliceDB[k]
<pygr.classutil.TupleO_homo_sapiens_core_53_36o.exon object at 0x2db9030>
(Pdb) self.sliceDB[k].id
292783L
(Pdb) self.getSliceAttr(self.sliceDB[k], 'id')
225896L
(Pdb) self.seqDB[225896L]
*** KeyError: 'id 225896 non-existent or not unique'
(Pdb) self.seqDB.seqRegionDB[225896L]
<pygr.classutil.TupleO_homo_sapiens_core_53_36o.seq_region object at  
0x10bfcd0>
(Pdb) self.seqDB.seqRegionDB[225896L].coord_system_id
15L
(Pdb) self.seqDB.prefixDict[15]
(Pdb) self.seqDB.prefixDict[15] is None
True
(Pdb) self.seqDB.coordSystems[15][225896L]
*** KeyError: 'id 225896 non-existent or not unique'

I then verified directly using the mysql client that this ID is indeed  
missing from
the ensembl dna table:
mysql> use homo_sapiens_core_53_36o
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> desc dna;
+---------------+------------------+------+-----+---------+-------+
| Field         | Type             | Null | Key | Default | Extra |
+---------------+------------------+------+-----+---------+-------+
| seq_region_id | int(10) unsigned | NO   | PRI | 0       |       |
| sequence      | longtext         | NO   |     |         |       |
+---------------+------------------+------+-----+---------+-------+
2 rows in set (0.17 sec)

mysql> select * from dna where seq_region_id=225896;
Empty set (0.17 sec)

It is unfortunate that ensembl has such a complicated set of coordinate  
systems for
their annotations.  However, to make your code work with their database,  
you'll have
to provide coord system interfaces that work successfully with the coord  
systems that
they mandate.

--
You received this message because you are listed in the owner
or CC fields of this issue, or because you starred this issue.
You may adjust your issue notification preferences at:
http://code.google.com/hosting/settings



More information about the pygr-notify mailing list