[khmer] Extracting kmer sequences from khmer output

C. Titus Brown ctbrown at ucdavis.edu
Sat Sep 19 06:08:46 PDT 2015


On Fri, Sep 18, 2015 at 03:25:39PM -0700, Miller, Ruth wrote:
> Hi,
> 
> I am hoping to get a list of the sequence of each kmer identified from my dataset and it?s abundance. This would allow me to compare the abundance of different kmers in my sample set, to see whether certain samples cluster together based on the abundance of kmers present.
> 
> Is there a way to do this in khmer?
> 
> Thanks,
> 
> Ruth

Hi Ruth,

yep! In khmer 2.0 we've added the 'count-kmers.py' and 'count-kmers-single.py'
scripts in the sandbox/ directory; you can either do

	load-into-counting.py kmers.g file1.fq file2.fq file3.fq ...
	sandbox/count-kmers.py kmers.g file1.fq

if you have multiple files, or

	sandbox/count-kmers-single.py file1.fq

The graph size parameters (-M, or -x / -N) still apply for setting the
size of the kmer countgraph database; see

	http://khmer.readthedocs.org/en/v2.0/user/choosing-table-sizes.html

for more info.

Now, the other problem is that count-kmers.py/count-kmers-single.py aren't
installed with 'pip install khmer' because they're not yet part of the
supported scripts.  So you'll need to grab them from the URLs below,

https://github.com/dib-lab/khmer/blob/master/sandbox/count-kmers.py
https://github.com/dib-lab/khmer/blob/master/sandbox/count-kmers-single.py

(click 'Raw' to download the text file) *or* check out the git repository
and call them from sandbox.

Let me know if you have any problems --

best,
--titus
-- 
C. Titus Brown, ctbrown at ucdavis.edu



More information about the khmer mailing list