[khmer] Counting kmers and disabling reverse complement

C. Titus Brown ctb at msu.edu
Fri Jun 14 07:36:35 PDT 2013


Thanks, Jordan.

Lester -- if you want to do standard pentamer signature analysis, here's
a script I wrote --

---

#! /usr/bin/env python
import sys
import khmer
import screed

KSIZE=5

def main(inp_name, outp_name, min_seq_len):
    outfp = open(outp_name, 'w')

    min_seq_len = int(min_seq_len)
    
    for record in screed.open(inp_name):
        if len(record.sequence) < min_seq_len:
            continue
        
        kt = khmer.new_ktable(KSIZE)
        kt.consume(record.sequence[:min_seq_len])

        x = []
        for i in range(4**KSIZE):
            x.append("%s" % (kt.get(i),))

        print >>outfp, " ".join(x)

if __name__ == '__main__':
    main(*sys.argv[1:4])
    
---

On Fri, Jun 14, 2013 at 08:53:22AM -0400, Jordan Fish wrote:
> Hi Lester,
> 
> Unless you are working with fairly small k-values you will probably want to
> use the CountingHash.  Ktable handles simple exact counting so far
> large-ish values of k (>12, according to
> http://khmer.readthedocs.org/en/latest/ktable.html) it'll blow up.
> 
> The counting hash uses a bloom filter to limit memory usage at the cost of
> in-exact counting.  Hopefully titus will jump in here with a link to some
> documentation on the inexact counting.
> 
> Finally, if you want to force khmer to treat a kmer and it's reverse
> complement as unique you will need to edit 'lib/Makefile' and change the
> line
> 
> NO_UNIQUE_RC=0
> 
> to
> 
> NO_UNIQUE_RC=1
> 
> and rebuild khmer
> 
> Jordan
> 
> On Fri, Jun 14, 2013 at 3:22 AM, Lester Mackey <lmackey at stanford.edu> wrote:
> 
> > Dear khmer Discussion List,
> >
> > If my goal is to obtain a vector of kmer counts quickly from a FASTA or
> > FASTQ file, is there any reason to prefer ktable to one of your other data
> > structures, like the counting hash table?
> >
> 
> > I've noticed that ktable hashes a kmer and its reverse complement to the
> > same bin.  Is there an easy way to disable this feature (and thereby count
> > each kmer and reverse complement separately)?
> >
> > Thanks,
> > Lester
> >
> > _______________________________________________
> > khmer mailing list
> > khmer at lists.idyll.org
> > http://lists.idyll.org/listinfo/khmer
> >
> >

> _______________________________________________
> khmer mailing list
> khmer at lists.idyll.org
> http://lists.idyll.org/listinfo/khmer


-- 
C. Titus Brown, ctb at msu.edu



More information about the khmer mailing list