[khmer] Counting kmers and disabling reverse complement
Lester Mackey
lmackey at stanford.edu
Fri Jun 14 10:33:51 PDT 2013
Thanks Jordan and Titus!
Am I correct that Titus's script will also work with kt =
khmer.new_counting_hash(KSIZE, starting_size)? What is the difference
between new_counting_hash and new_hashtable?
Thanks again,
Lester
On Fri, Jun 14, 2013 at 7:36 AM, C. Titus Brown <ctb at msu.edu> wrote:
> Thanks, Jordan.
>
> Lester -- if you want to do standard pentamer signature analysis, here's
> a script I wrote --
>
> ---
>
> #! /usr/bin/env python
> import sys
> import khmer
> import screed
>
> KSIZE=5
>
> def main(inp_name, outp_name, min_seq_len):
> outfp = open(outp_name, 'w')
>
> min_seq_len = int(min_seq_len)
>
> for record in screed.open(inp_name):
> if len(record.sequence) < min_seq_len:
> continue
>
> kt = khmer.new_ktable(KSIZE)
> kt.consume(record.sequence[:min_seq_len])
>
> x = []
> for i in range(4**KSIZE):
> x.append("%s" % (kt.get(i),))
>
> print >>outfp, " ".join(x)
>
> if __name__ == '__main__':
> main(*sys.argv[1:4])
>
> ---
>
> On Fri, Jun 14, 2013 at 08:53:22AM -0400, Jordan Fish wrote:
> > Hi Lester,
> >
> > Unless you are working with fairly small k-values you will probably want
> to
> > use the CountingHash. Ktable handles simple exact counting so far
> > large-ish values of k (>12, according to
> > http://khmer.readthedocs.org/en/latest/ktable.html) it'll blow up.
> >
> > The counting hash uses a bloom filter to limit memory usage at the cost
> of
> > in-exact counting. Hopefully titus will jump in here with a link to some
> > documentation on the inexact counting.
> >
> > Finally, if you want to force khmer to treat a kmer and it's reverse
> > complement as unique you will need to edit 'lib/Makefile' and change the
> > line
> >
> > NO_UNIQUE_RC=0
> >
> > to
> >
> > NO_UNIQUE_RC=1
> >
> > and rebuild khmer
> >
> > Jordan
> >
> > On Fri, Jun 14, 2013 at 3:22 AM, Lester Mackey <lmackey at stanford.edu>
> wrote:
> >
> > > Dear khmer Discussion List,
> > >
> > > If my goal is to obtain a vector of kmer counts quickly from a FASTA or
> > > FASTQ file, is there any reason to prefer ktable to one of your other
> data
> > > structures, like the counting hash table?
> > >
> >
> > > I've noticed that ktable hashes a kmer and its reverse complement to
> the
> > > same bin. Is there an easy way to disable this feature (and thereby
> count
> > > each kmer and reverse complement separately)?
> > >
> > > Thanks,
> > > Lester
> > >
> > > _______________________________________________
> > > khmer mailing list
> > > khmer at lists.idyll.org
> > > http://lists.idyll.org/listinfo/khmer
> > >
> > >
>
> > _______________________________________________
> > khmer mailing list
> > khmer at lists.idyll.org
> > http://lists.idyll.org/listinfo/khmer
>
>
> --
> C. Titus Brown, ctb at msu.edu
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20130614/d1613b5e/attachment-0002.htm>
More information about the khmer
mailing list