[pygr-notify] [pygr commit] r72 - wiki

codesite-noreply at google.com codesite-noreply at google.com
Wed Jul 9 20:07:14 PDT 2008


Author: cjlee112
Date: Wed Jul  9 20:04:30 2008
New Revision: 72

Modified:
   wiki/SolexaTools.wiki

Log:
Edited wiki page through web user interface.

Modified: wiki/SolexaTools.wiki
==============================================================================
--- wiki/SolexaTools.wiki	(original)
+++ wiki/SolexaTools.wiki	Wed Jul  9 20:04:30 2008
@@ -12,18 +12,10 @@

 = Datasets =

-*Namshin Kim*: Here is the situation. I have 1G, a billion, reads of 
36bp. I want to scan
-the alignments by genomic coordinates, and then I can see genomic variations
-in detail. I can make variation calling module or basic module for genome
-browser. I am already working on it.
-
-
+*Namshin Kim*: Here is the situation. I have 1G, a billion, reads of 36bp.

 = Analyses =
-*Namshin Kim*: For the solexa data processing, I am trying to save 
them into axtNet format
-in pairwiseMode. Of course I can save as annotation database, but I thought
-it would be much useful. Correct me if there is another way to do, maybe
-combination of annotation database and seqdb?
+*Namshin Kim*:  I want to scan the alignments by genomic coordinates, 
and then I can see genomic variations in detail. I can make variation 
calling module or basic module for genome browser. I am already working 
on it.For the solexa data processing, I am trying to save them into 
axtNet format in pairwiseMode. Of course I can save as annotation 
database, but I thought it would be much useful. Correct me if there is 
another way to do, maybe combination of annotation database and seqdb?

 = Problems We Must Solve =
  *Namshin Kim*: Here are the problems. Assume that I decided to save 
them as pygr-aware
@@ -39,26 +31,12 @@
  Shawn Cokus in Matteo Pellegrini's lab has developed a probabilistic 
mapping algorithm that is fast, scalable, and accurate.  We've used 
this for an alternative splicing Solexa analysis.  I've mentioned to 
Shawn that it would be interesting to incorporate this into Pygr with 
an NLMSA-like interface.

 == Database Classes ==
-*Chris Lee*: I think we should consider having a sequence database class
-optimized for huge numbers of fixed length reads (like Solexa).
+*Chris Lee*: I think we should consider having a sequence database 
class optimized for huge numbers of fixed length reads (like Solexa).

   * each sequence would have a int ID assigned in ascending order
+  * you would initialize the DB by specifying a max length per 
sequence.  It would then store sequences in a disk file as fixed length 
blocks of exactly that size.  It can then fseek() directly to the right 
block just based on the numerical ID of the sequence.
+  * if we wanted we could eventually develop flavors that store the 
data using 2 bit or other reduced representation to save space.
+  * you can add more sequences at any time, and I guess you could 
remove sequences as well, although I don't know what use that would be.
+  * the Solexa assigned string ID of each sequence could be kept in a  
separate file on more or less the same principle, so that one can map 
 from number ID to string ID quickly and easily.  The reverse mapping 
implies using shelve or some equivalent.

-  * you would initialize the DB by specifying a max length per
-sequence.  It would then store sequences in a disk file as fixed
-length blocks of exactly that size.  It can then fseek() directly to
-the right block just based on the numerical ID of the sequence.
-
-  * if we wanted we could eventually develop flavors that store the data
-using 2 bit or other reduced representation to save space.
-
-  * you can add more sequences at any time, and I guess you could remove
-sequences as well, although I don't know what use that would be.
-
-  * the Solexa assigned string ID of each sequence could be kept in a
-separate file on more or less the same principle, so that one can map
-from number ID to string ID quickly and easily.  The reverse mapping
-implies using shelve or some equivalent.
-
-  * the interface to the SolexaDB would be the same as BlastDB, of
-course.  Or maybe this would just be another subclass of BlastDBbase...
\ No newline at end of file
+  * the interface to the SolexaDB would be the same as BlastDB, of 
course.  Or maybe this would just be another subclass of BlastDBbase...
\ No newline at end of file



More information about the pygr-notify mailing list