<div dir="ltr">On Thu, Jun 20, 2013 at 4:40 PM, Lester Mackey <span dir="ltr"><<a href="mailto:lmackey@stanford.edu" target="_blank">lmackey@stanford.edu</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Thanks Titus,<div><br><div class="gmail_extra"><div class="gmail_quote"><div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div>
> Does the counting hash have a built-in way to enumerate used hash table<br>
> entries without having to iterate over every hash table entry?<br>
<br>
</div>Err, no :). Are you using ktable or counting hash? If the latter then you<br>
absolutely need to keep an explicit list of query k-mers.<br>
<div><br>
> On a slightly related note, if I have 16GB of memory to work with, is it<br>
> advisable to choose hash_size = min(4**k, 16e9)/4 and n_tables = 4 when<br>
> calling new_counting_hash for k-mer counting?<br>
<br></div></blockquote></div><div>If I wanted to minimize the amount of memory used when k is small (i.e., when 4**k bytes is much smaller than 16GB), would setting <span style="color:rgb(80,0,80)">hash_size = 4**(k-1) and n_tables = 4 or</span></div>
<div><span style="color:rgb(80,0,80)">hash_size = 4**k and n_tables = 1 be sufficient for a small false positive rate?</span></div></div></div></div></div></blockquote><div><br></div><div>Those configurations should give you small error rates (4^k shouldn't have any false positives at all...), but there aren't any hard and fast rules. However if you're really worried about false positives and are working with small k values you really should just use the ktable class.</div>
<div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr"><div><div class="gmail_extra"><div class="gmail_quote"><div><span style="color:rgb(80,0,80)"><br></span></div><div><br></div><div>Thanks,</div>
<div>Lester</div><div><div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div><div><br>
> > ><br>
> > > > On Fri, Jun 14, 2013 at 3:22 AM, Lester Mackey <<a href="mailto:lmackey@stanford.edu" target="_blank">lmackey@stanford.edu</a><br>
> > >wrote:<br>
> > > ><br>
> > > >> Dear khmer Discussion List,<br>
> > > >><br>
> > > >> If my goal is to obtain a vector of kmer counts quickly from a FASTA<br>
> > or<br>
> > > >> FASTQ file, is there any reason to prefer ktable to one of your other<br>
> > data<br>
> > > >> structures, like the counting hash table?<br>
> > > >><br>
> > > ><br>
> > > >> I've noticed that ktable hashes a kmer and its reverse complement to<br>
> > the<br>
> > > >> same bin. Is there an easy way to disable this feature (and thereby<br>
> > count<br>
> > > >> each kmer and reverse complement separately)?<br>
> > > >><br>
> > > >> Thanks,<br>
> > > >> Lester<br>
> > > >><br>
> > > >> _______________________________________________<br>
> > > >> khmer mailing list<br>
> > > >> <a href="mailto:khmer@lists.idyll.org" target="_blank">khmer@lists.idyll.org</a><br>
> > > >> <a href="http://lists.idyll.org/listinfo/khmer" target="_blank">http://lists.idyll.org/listinfo/khmer</a><br>
> > > >><br>
> > > >><br>
> > > ><br>
> ><br>
> > > _______________________________________________<br>
> > > khmer mailing list<br>
> > > <a href="mailto:khmer@lists.idyll.org" target="_blank">khmer@lists.idyll.org</a><br>
> > > <a href="http://lists.idyll.org/listinfo/khmer" target="_blank">http://lists.idyll.org/listinfo/khmer</a><br>
> ><br>
> ><br>
> > --<br>
> > C. Titus Brown, <a href="mailto:ctb@msu.edu" target="_blank">ctb@msu.edu</a><br>
> ><br>
<br>
--<br>
C. Titus Brown, <a href="mailto:ctb@msu.edu" target="_blank">ctb@msu.edu</a><br>
</div></div></blockquote></div></div></div><br></div></div></div>
</blockquote></div><br></div></div>