<div dir="ltr">Thanks Jordan and Titus!<div><br></div><div>Am I correct that Titus's script will also work with kt = khmer.new_counting_hash(KSIZE, starting_size)? What is the difference between new_counting_hash and new_hashtable?</div>
<div><br></div><div style>Thanks again,</div><div style>Lester</div><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, Jun 14, 2013 at 7:36 AM, C. Titus Brown <span dir="ltr"><<a href="mailto:ctb@msu.edu" target="_blank">ctb@msu.edu</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Thanks, Jordan.<br>
<br>
Lester -- if you want to do standard pentamer signature analysis, here's<br>
a script I wrote --<br>
<br>
---<br>
<br>
#! /usr/bin/env python<br>
import sys<br>
import khmer<br>
import screed<br>
<br>
KSIZE=5<br>
<br>
def main(inp_name, outp_name, min_seq_len):<br>
outfp = open(outp_name, 'w')<br>
<br>
min_seq_len = int(min_seq_len)<br>
<br>
for record in screed.open(inp_name):<br>
if len(record.sequence) < min_seq_len:<br>
continue<br>
<br>
kt = khmer.new_ktable(KSIZE)<br>
kt.consume(record.sequence[:min_seq_len])<br>
<br>
x = []<br>
for i in range(4**KSIZE):<br>
x.append("%s" % (kt.get(i),))<br>
<br>
print >>outfp, " ".join(x)<br>
<br>
if __name__ == '__main__':<br>
main(*sys.argv[1:4])<br>
<br>
---<br>
<div class="HOEnZb"><div class="h5"><br>
On Fri, Jun 14, 2013 at 08:53:22AM -0400, Jordan Fish wrote:<br>
> Hi Lester,<br>
><br>
> Unless you are working with fairly small k-values you will probably want to<br>
> use the CountingHash. Ktable handles simple exact counting so far<br>
> large-ish values of k (>12, according to<br>
> <a href="http://khmer.readthedocs.org/en/latest/ktable.html" target="_blank">http://khmer.readthedocs.org/en/latest/ktable.html</a>) it'll blow up.<br>
><br>
> The counting hash uses a bloom filter to limit memory usage at the cost of<br>
> in-exact counting. Hopefully titus will jump in here with a link to some<br>
> documentation on the inexact counting.<br>
><br>
> Finally, if you want to force khmer to treat a kmer and it's reverse<br>
> complement as unique you will need to edit 'lib/Makefile' and change the<br>
> line<br>
><br>
> NO_UNIQUE_RC=0<br>
><br>
> to<br>
><br>
> NO_UNIQUE_RC=1<br>
><br>
> and rebuild khmer<br>
><br>
> Jordan<br>
><br>
> On Fri, Jun 14, 2013 at 3:22 AM, Lester Mackey <<a href="mailto:lmackey@stanford.edu">lmackey@stanford.edu</a>> wrote:<br>
><br>
> > Dear khmer Discussion List,<br>
> ><br>
> > If my goal is to obtain a vector of kmer counts quickly from a FASTA or<br>
> > FASTQ file, is there any reason to prefer ktable to one of your other data<br>
> > structures, like the counting hash table?<br>
> ><br>
><br>
> > I've noticed that ktable hashes a kmer and its reverse complement to the<br>
> > same bin. Is there an easy way to disable this feature (and thereby count<br>
> > each kmer and reverse complement separately)?<br>
> ><br>
> > Thanks,<br>
> > Lester<br>
> ><br>
> > _______________________________________________<br>
> > khmer mailing list<br>
> > <a href="mailto:khmer@lists.idyll.org">khmer@lists.idyll.org</a><br>
> > <a href="http://lists.idyll.org/listinfo/khmer" target="_blank">http://lists.idyll.org/listinfo/khmer</a><br>
> ><br>
> ><br>
<br>
> _______________________________________________<br>
> khmer mailing list<br>
> <a href="mailto:khmer@lists.idyll.org">khmer@lists.idyll.org</a><br>
> <a href="http://lists.idyll.org/listinfo/khmer" target="_blank">http://lists.idyll.org/listinfo/khmer</a><br>
<br>
<br>
</div></div><span class="HOEnZb"><font color="#888888">--<br>
C. Titus Brown, <a href="mailto:ctb@msu.edu">ctb@msu.edu</a><br>
</font></span></blockquote></div><br></div></div>