[pygr-notify] [pygr commit] r84 - wiki
codesite-noreply at google.com
codesite-noreply at google.com
Tue Jul 22 13:41:11 PDT 2008
Author: ramccreary
Date: Tue Jul 22 13:40:30 2008
New Revision: 84
Modified:
wiki/SearchingforPatterns.wiki
Log:
Edited wiki page through web user interface.
Modified: wiki/SearchingforPatterns.wiki
==============================================================================
--- wiki/SearchingforPatterns.wiki (original)
+++ wiki/SearchingforPatterns.wiki Tue Jul 22 13:40:30 2008
@@ -2,15 +2,15 @@
= Introduction =
-This is a continuation of the article GenomeCalculationsUsingpygr (for
the full code, see article). By quantifying the trinucleotide repeats
found in the E. coli genome, as well as the number of repeats per gene,
the underlying organizational structure of the genome sequence can be
further studied.
+This straightforward script is a continuation of the article
GenomeCalculationsUsingpygr (for the full code, see article). By
quantifying the trinucleotide repeats found in the E. coli genome, as
well as the number of repeats per gene, the underlying organizational
structure of the genome sequence can be further studied.
= A Rundown of the Code =
+First, the number of nucleotide bases per gene is counted and stored
in the dictionary ec_count. The gene sequences from the annotation db
had to be converted to a string before they could be iterated through.
The final count was assigned to the dictionary ecoli_nuc_count, which
now holds both the genes and the number of bases per gene. For example,
the gene'1869' will have this value: '1869': {'A': 512, 'C': 478, 'W':
0, 'G': 492, 'T': 452}.
+
{{{
ecoli_nuc_count = {}
-nucs = str(ecgenome)
-
for gene, annot in annot_db.iteritems():
ec_count = dict(A=0, C=0, T=0, G=0, W=0)
genes = str(annot.sequence)
@@ -19,6 +19,8 @@
ecoli_nuc_count[gene] = ec_count
}}}
+In order to search for each potential codon, the combinations of
A,T,G, and C must be defined; however, by using xselections, I was able
to simple specify the nucleotides and the length of each resulting
string, and thus all 64 possible codons were identified. The E. coli
genome sequence is turned into a string, and the occurrences of each
codon within is counted and stored in the nucsum dictionary. Finally,
the results are printed for the user to examine or record.
+
{{{
def xselections(items, n):
if n==0: yield []
@@ -30,11 +32,14 @@
nucsum = {}
for uc in xselections(['G','A','T','C'],3):
triplet = "".join(uc)
+ nucs = str(ecgenome)
nuccount = nucs.count("".join(uc))
nucsum[triplet] = nuccount
print('The number of %s trinucleotide repeats\
in the E. coli genome is %f' % (triplet, nuccount))
}}}
+
+This segment is essentially the same as the preceding segment, except
a structure is built to hold the values from the loop. The nucleotide
triplets are defined, then each gene is searching for their presence.
Once again, the results are printed.
{{{
genesum = {}
More information about the pygr-notify
mailing list