[khmer] questions about khmer utilities

C. Titus Brown ctb at msu.edu
Mon Aug 19 18:24:57 PDT 2013

On Tue, Aug 20, 2013 at 12:50:29AM +0000, Kenlee Nakasugi wrote:
> I'm just starting to do some kmer-type analyses on my read data. 
> I am trying to compare/find any differences in two different read datasets, and am trying to get some basic k-mer stats, but more importantly trying to figure out what would be the best type of comparison to do.
> I have already generated the hash (.kh) and hist (.hist) via the load-into-counting.py and abundance-dist.py scripts from khmer v0.4. 
> Now from http://khmer.readthedocs.org/en/latest/blog-posts.html , i wanted to get the abundance by position, and hi-lo kmer distributions, but are the scripts listed there (and found in sandbox directory in khmer distribution) compatible with the output from 'load-into-counting.py and abundance-dist.py' ? 
> I tried inputting the .hist and hash tables but getting errors like:
> ##
> python /usr/local/bin/khmer/sandbox/abundance-hist-by-position.py R1.k32.hist 
> ... 0
> Traceback (most recent call last):
>   File "/usr/local/bin/khmer/sandbox/abundance-hist-by-position.py", line 15, in <module>
>     countSum[i] += int(tok[i])
> ValueError: invalid literal for int() with base 10: '0.617'
> ##
> Do I need to run these scripts from http://khmer.readthedocs.org/en/latest/blog-posts.html from scratch ?
> And any other types of comparisions useful? 

Hi Ken,

yep, those scripts appear to be out of date with respect to the instructions;
thanks for pointing this out!

If you want to get abundance-1 and abundance-255 k-mers by position,
you can use

	python sandbox/hi-lo-abundance-by-position.py 25k data/25k.fq.gz

and you'll see the output in .pos.abund=1 and .pos.abund=255 files.

To use the by-position scripts, here is a short guide with the sequences
in 25k.fq.gz:

python scripts/load-into-counting.py -k 20 25k data/25k.fq.gz
python sandbox/fasta-to-abundance-hist.py 25k data/25k.fq.gz
python sandbox/abundance-hist-by-position.py data/25k.fq.gz.freq > out.dist

We'll amend the documentation - thanks again for letting us know about this!

C. Titus Brown, ctb at msu.edu

More information about the khmer mailing list