[pygr-notify] [pygr commit] r84 - wiki

Tue Jul 22 13:41:11 PDT 2008

Author: ramccreary
Date: Tue Jul 22 13:40:30 2008
New Revision: 84

Modified:
   wiki/SearchingforPatterns.wiki

Log:
Edited wiki page through web user interface.

Modified: wiki/SearchingforPatterns.wiki
==============================================================================

--- wiki/SearchingforPatterns.wiki	(original)
+++ wiki/SearchingforPatterns.wiki	Tue Jul 22 13:40:30 2008
@@ -2,15 +2,15 @@

 = Introduction =

-This is a continuation of the article GenomeCalculationsUsingpygr (for 
the full code, see article). By quantifying the trinucleotide repeats 
found in the E. coli genome, as well as the number of repeats per gene, 
the underlying organizational structure of the genome sequence can be 
further studied.
+This straightforward script is a continuation of the article 
GenomeCalculationsUsingpygr (for the full code, see article). By 
quantifying the trinucleotide repeats found in the E. coli genome, as 
well as the number of repeats per gene, the underlying organizational 
structure of the genome sequence can be further studied.


 = A Rundown of the Code =

+First, the number of nucleotide bases per gene is counted and stored 
in the dictionary ec_count. The gene sequences from the annotation db 
had to be converted to a string before they could be iterated through. 
The final count was assigned to the dictionary ecoli_nuc_count, which 
now holds both the genes and the number of bases per gene. For example, 
the gene'1869' will have this value: '1869': {'A': 512, 'C': 478, 'W': 
0, 'G': 492, 'T': 452}.
+
 {{{
 ecoli_nuc_count = {}
-nucs = str(ecgenome)
-
 for gene, annot in annot_db.iteritems():
     ec_count = dict(A=0, C=0, T=0, G=0, W=0)
     genes = str(annot.sequence)
@@ -19,6 +19,8 @@
     ecoli_nuc_count[gene] = ec_count
 }}}

+In order to search for each potential codon, the combinations of 
A,T,G, and C must be defined; however, by using xselections, I was able 
to simple specify the nucleotides and the length of each resulting 
string, and thus all 64 possible codons were identified. The E. coli 
genome sequence is turned into a string, and the occurrences of each 
codon within is counted and stored in the nucsum dictionary. Finally, 
the results are printed for the user to examine or record.
+
 {{{
 def xselections(items, n):
     if n==0: yield []
@@ -30,11 +32,14 @@
 nucsum = {}
 for uc in xselections(['G','A','T','C'],3):
     triplet = "".join(uc)
+    nucs = str(ecgenome)
     nuccount = nucs.count("".join(uc))
     nucsum[triplet] = nuccount
     print('The number of %s trinucleotide repeats\
  in the E. coli genome is %f' % (triplet, nuccount))
 }}}
+
+This segment is essentially the same as the preceding segment, except 
a structure is built to hold the values from the loop. The nucleotide 
triplets are defined, then each gene is searching for their presence. 
Once again, the results are printed.

 {{{
 genesum = {}