[pygr-notify] [pygr commit] r67 - wiki
codesite-noreply at google.com
codesite-noreply at google.com
Tue Jul 8 09:42:34 PDT 2008
Author: ramccreary
Date: Tue Jul 8 09:41:10 2008
New Revision: 67
Modified:
wiki/DataStorageUsingpygr.wiki
Log:
Edited wiki page through web user interface.
Modified: wiki/DataStorageUsingpygr.wiki
==============================================================================
--- wiki/DataStorageUsingpygr.wiki (original)
+++ wiki/DataStorageUsingpygr.wiki Tue Jul 8 09:41:10 2008
@@ -11,7 +11,7 @@
The OptionParser class, taken from the optparse module, processes
command line arguments. The filenames ( a .fna file and .gff file, in
this case) are supplied by attributing them to an option name in the
command line. If the attribute for either option is 'None', indicating
there was no filename supplied, the help text for optparse will be
printed, as well as the available option names. The option for each
file was added, then given a name that would represent it during parsing.
-In this step, the genome is loaded into the annotation database. The
BlastDB module establishes a BLAST database for the genome.
+In this step, the genome is loaded into a BLAST database. The BlastDB
module establishes a BLAST database for the genome.
{{{
#! /usr/bin/env python
@@ -174,10 +174,11 @@
Finally, the MySQL database for the annotation is built, and saved as
the supplied database name. conn.commit() closes the database and the
transaction and makes the changes permanent.
+Here, slicedb uses the annotation objects from the SQLTable, which
correspond to the gene sequences in the BLAST database previously
constructed that contains the genome. AnnotationDB uses the annotations
as keys within a dictionary, and the values are the annotation objects,
which are similar to sequence intervals in that they represent segments
of the genome, but have annotation data associated with them. The two
containers supplied for AnnotationDB are the slicedb, which contains
the SQL table that holds the list of annotation objects, and sequence
database for the E. coli sequence that holds the sequence intervals.
-Here, slicedb uses the annotation objects from the SQLTable, which
correspond to the gene sequences in the BLAST database previously
constructed that contains the genome. AnnotationDB uses the sequence
intervals are keys within a dictionary, and the values are the
annotation objects, which are similar to sequence intervals in that
they represent segments of the genome, but have annotation data
associated with them. The two containers supplied for AnnotationDB are
the slicedb, which contains the SQL table that holds the list of
annotation objects, and sequence database for the E. coli sequence.
+It should be noted that the sequence intervals and the annotation
objects are NOT stored in the same database. The annotations and their
corresponding IDs are stored within the annotation database, while the
related sequence intervals are stored within the sequence database,
also with their own IDs. However, the annotation objects have a
sequence attribute that enables the annotation's matching sequence
interval to be given as well.
-Then, a dictionary is created to hold the annotation database and the
genome database together. PrefixUnionDict provides a cohesive interface
to access the data in the two databases.
+Then, a dictionary is created to hold the annotations and the
sequences they correspond to together. PrefixUnionDict provides a
cohesive interface to access the data in the two databases.
Finally, an annotation map is created, with the annotations added. The
nested list format for data structure shortens the time needed to scan
the intervals by storing overlapping intervals in a more efficient and
hierarchal format. The annotations are then mapped to the segment of
the genome to which they correspond.
More information about the pygr-notify
mailing list