[pygr-notify] [pygr commit] r101 - wiki
codesite-noreply at google.com
codesite-noreply at google.com
Wed Sep 10 19:01:38 PDT 2008
Author: jqian.ubc
Date: Wed Sep 10 19:01:27 2008
New Revision: 101
Modified:
wiki/PygrOnEnsembl.wiki
Log:
Edited wiki page through web user interface.
Modified: wiki/PygrOnEnsembl.wiki
==============================================================================
--- wiki/PygrOnEnsembl.wiki (original)
+++ wiki/PygrOnEnsembl.wiki Wed Sep 10 19:01:27 2008
@@ -1,8 +1,8 @@
-#summary using pygr to develop an Ensembl API
+#summary using Pygr to develop an Ensembl API
= Introduction =
-The Ensembl database system is a central data repository for various
eukaryotic genome sequences and their annotated information
[http://www.ensembl.org Ensembl Home]. The screenshots of schema diagrams
for the four basic types of databases (core, compara, variation and
funcgen) can be found at:
[http://groups.google.com/group/pygr-dev/files?hl=en pygr-dev files]. They
were created using the files in the sql/
+The Ensembl database system is a central data repository for various
eukaryotic genome sequences and their annotated information
[http://www.ensembl.org Ensembl Home]. The screen shots of schema diagrams
for the four basic types of databases (core, compara, variation and
funcgen) can be found at:
[http://groups.google.com/group/pygr-dev/files?hl=en pygr-dev files]. They
were created using the files in the sql/
directory of the ensembl CVS module. The
[http://pygr-dev.googlegroups.com/web/table.sql?gda=M3ILbjoAAABJgcRQ_B738LYip0lXSox5BrGVnIRWNUQzXUPZ5KyWuGG1qiJ7UbTIup-M2XPURDTDvhSABxKrnfEc_FGQElaK
table.sql] file gives the table
definitions and the
[http://pygr-dev.googlegroups.com/web/foreign_keys.sql?gda=K02TckEAAABJgcRQ_B738LYip0lXSox5BrGVnIRWNUQzXUPZ5KyWuGG1qiJ7UbTIup-M2XPURDRvOefWPvoIMlEIkd9UdRbQLTxVVTd9FLrlvrrz00ZndA
foreign_keys.sql] gives the foreign key definitions. Being able to access
its numerous large databases efficiently is indispensable to any genome
research project. Currently, the Ensembl databases are mostly accessed
through a Perl API or a (less developed) Java API. No equivalent Python API
is yet available.
@@ -65,37 +65,42 @@
*Framework*
-*1.* the datamodel.py module
-a BaseModel super class and its subclasses. Each subclass represents a
biological entity.
+*1.* the datamodel module (datamodel.py):
+- a generic datamodel (BaseModel) class (super class). It is a subclass
of the Pygr's sqlgraph.TupleO
+- specialized datamodel classes (subclasses of BaseModel). Each subclass
represents a biological entity, or an Ensembl row/item object.
+- a generic Feature class. It represents a generic Ensembl feature. An
Ensembl feature refers to an object that has the attributes of
seq_region_id, seq_region_start, seq_region_end and seq_region_strand. The
get_sequence() method is implemented using Pygr's seqdb.AnnotationDB
+- specialized feature classes (subclasses of Feature). The schema between
features is implemented using Pygr's sqlgraph.SQLGraph
+
+*2.* the adaptor module (adaptor.py):
+- a Registry class: provides a connection to the ensembl SQL server
+- specialized adaptor classes (subclasses of Pygr's sqlgraph.SQLTable
class): provides access to a specific sql table in an ensembl core database.
+- private module methods: provide automatic saving of the Ensembl database
schema to pygr.Data
-*2.* the adaptor.py module
-a Registry class, a generic adaptor class (super class) and many
specialized adaptor classes (sub classes). Each specialized adaptor class
employs pygr modules (mainly the sqlgraph and seqdb module) and provides
access to its corresponding sql table in an ensembl core database.
+*3.* the featuremapping module (featuremapping.py): provides mapping
between ensembl features
-*3.* the featuremapping.py module
-
-*4.* the supporting module (seqregion.py): extensions of the pygr core
modules.
+*4.* the supporting module (seqregion.py): provides mapping between a
sequence slice and the set of Ensembl features in the slice.
*Design Pattern*
-The Driver class in the adaptor module is implemented as a singleton
class, since making a connection to the database is expensive.
+The Registry class in the adaptor module is implemented as a singleton
class, since making a connection to the server is expensive.
= Implemented Functionality =
-The latest ensembl API allows the user to perform the following tasks:
+The latest Ensembl API allows the user to perform the following tasks:
*General methods*
Create a connection to the ensembl MySQL server:
-serverRegistry = get_registry(host='ensembldb.ensembl.org',
user='anonymous')
+`serverRegistry = get_registry(host='ensembldb.ensembl.org',
user='anonymous')`
Create access to an ensembl core database:
-coreDBAdaptor =
serverRegistry.get_DBAdaptor('homo_sapiens', 'core', '47_36i')
+`coreDBAdaptor =
serverRegistry.get_DBAdaptor('homo_sapiens', 'core', '47_36i')`
Retrieve a sequence object:
-coreDBAdaptor.fetch_slice_by_seqregion(coordSystemName, seqregionName)
+`coreDBAdaptor.fetch_slice_by_seqregion(coordSystemName, seqregionName)`
-coordSystemName: 'chromosome' or 'contig'
-seqreionName: a chromosome name, such as '1'
@@ -105,17 +110,17 @@
Create access to any table in an ensembl core database:
e.g.
-transcriptAdaptor = coreDBAdaptor.get_adaptor('transcript') will return a
transcriptAdaptor object that can be used to access any record/item in the
transcript table.
+`transcriptAdaptor = coreDBAdaptor.get_adaptor('transcript')` will return
a transcriptAdaptor object that can be used to access any record/item in
the transcript table.
Create access to any record in an ensembl sql table:
e.g.
-transcript = transcriptAdaptor[1] will return a transcript item with the
unique dbID 1
+`transcript = transcriptAdaptor[1]` will return a transcript item with the
unique dbID 1
Create access to any column of an ensembl sql table record:
e.g.
-transcript.seq_region_start will return the seq_region_start value of the
give transcript
+`transcript.seq_region_start` will return the seq_region_start value of
the give transcript
*Methods for an ensembl feature object*
@@ -123,69 +128,69 @@
An ensembl feature refers to an object that has the attributes of
seq_region_id, seq_region_start, seq_region_end and seq_region_strand.
Retrieve the sequence of an ensembl feature:
-get_sequence()
+`get_sequence()`
e.g.
-gene.get_sequence() will return a sequence object of the given gene.
+`gene.get_sequence()` will return a sequence object of the given gene.
optional argument for this method: the lengh of the flanking region on
both sides of the feature sequence:
e.g.
-gene.get_sequence(500) will return the sequence of the gene plus 500bp
flanking regions on both sides of the gene.
+`gene.get_sequence(500)` will return the sequence of the gene plus 500bp
flanking regions on both sides of the gene.
Find all the feature objects in a particular slice:
-fetch_all_by_slice(slice)
+`fetch_all_by_slice(slice)`
e.g.
-transcriptAdaptor.fetch_all_by_slice(slice) will retrieve all the
transcripts in the give slice.
+`transcriptAdaptor.fetch_all_by_slice(slice)` will retrieve all the
transcripts in the give slice.
Retrieve the stable_id, created_date, modified_date or the version for a
gene/transcript/translation/exon
e.g.
-gene.get_stable_id() will return the ensembl stable_id for the given gene
+`gene.get_stable_id()` will return the ensembl stable_id for the given gene
Obtain a gene object:
-transcript.get_gene()
-geneAdaptor.fetch_by_stable_id(geneStableID)
+`transcript.get_gene()`
+`geneAdaptor.fetch_by_stable_id(geneStableID)`
Obtain transcript objects:
-gene.get_transcripts()
-exon.get_all_transcripts()
-translation.get_transcript()
-transcriptAdaptor.fetch_by_stable_id(transcriptStableID)
+`gene.get_transcripts()`
+`exon.get_all_transcripts()`
+`translation.get_transcript()`
+`transcriptAdaptor.fetch_by_stable_id(transcriptStableID)`
Obtain exon objects:
-transcript.get_all_exons()
-exonAdaptor.fetch_by_stable_id(exonStableID)
+`transcript.get_all_exons()`
+`exonAdaptor.fetch_by_stable_id(exonStableID)`
Obtain a translation object:
-transcript.get_translation()
-translationAdaptor.fetch_by_stable_id(translationStableID)
+`transcript.get_translation()`
+`translationAdaptor.fetch_by_stable_id(translationStableID)`
Obtain a spliced sequence object:
-transcript.get_spliced_seq()
+`transcript.get_spliced_seq()`
Obtain a five-prime untranslated region:
-transcript.get_five_utr()
+`transcript.get_five_utr()`
Obtain a three-prime untranslated region:
-transcript.get_three_utr()
+`transcript.get_three_utr()`
Obtain a prediction_transcript object:
-predictionExon.get_prediction_transcript()
+`predictionExon.get_prediction_transcript()`
Obtain prediction_exon objects:
-predictionTranscript.get_all_prediction_exons()
+`predictionTranscript.get_all_prediction_exons()`
Additional sample code can be found under major methods in both the
adaptor.py module and the datamodel.py module, in the form of doctests.
@@ -195,7 +200,7 @@
*1.* The latest Ensembl API tarball Qing_Qian.tar.gz can be downloaded
from
[http://code.google.com/p/google-summer-of-code-2008-psf/downloads/list#].
For the prerequisites and installation details, please refer to the README
file.
-Alternatively, the current ensembl API code, together with pygr, can be
retrieved from the public git repository. To check out a copy, run the
following instruction on the command line:
+Alternatively, the current ensembl API code, together with Pygr, can be
retrieved from the public git repository. To check out a copy, run the
following instruction on the command line:
`git clone git://iorich.caltech.edu/git/public/pygr-jenny <dirname of your
choice>`
More information about the pygr-notify
mailing list