[khmer] less reads but more kmers?

C. Titus Brown ctb at msu.edu
Sat Jan 18 05:50:58 PST 2014


On Sat, Jan 18, 2014 at 05:00:03AM -0800, C. Titus Brown wrote:
> On Fri, Jan 17, 2014 at 09:31:32PM -0200, Nacho Caballero wrote:
> > I used khmer to digitally normalize two assemblies:
> > 
> >    - After normalization, Assembly A has *1.5 million reads*, and during
> >    assembly SPAdes uses *116 million* kmers (k=37)
> >    - After normalization, Assembly B has *1.5 million reads*, during
> >    assembly SPAdes uses *612 million* kmers (k=37)
> > 
> > I followed the same protocol on both assemblies (quality filtering with
> > Trimmomatic, 3-pass normalization, etc.), so I don???t understand why
> > assembly B, with 16x fewer reads, has 8x more kmers than assembly A.
> > 
> > What are some possible explanations?
> 
> Barring some extraordinarily bizarre bug, the answer *must* be SPAdes
> is *choosing to use* more k-mers... I'll ask the SPAdes authors ;)

Anton (one of the SPAdes authors) pointed out that I'd misread the e-mail.
If dataset A and dataset B are from different samples, then they could easily
have different levels of diversity which would lead to different numbers of
k-mers for the same coverage level.

The simplest explanation would be that dataset B is both more diverse
and has lower coverage than dataset A, I think.  I would guess that
if you generated 6 times as much data for sample B then diginorm would
leave you with many more reads, although this is a bit dependent on the
diversity of sample B.

cheers,
--titus

> 
> If you want to check the total number of k-mers, we have some scripts
> in khmer to do that.  See 'abundance-dist-single.py' here,
> 
> 	http://khmer.readthedocs.org/en/latest/scripts.html#scripts-counting
> 
> cheers,
> --titus
> -- 
> C. Titus Brown, ctb at msu.edu
> 
> _______________________________________________
> khmer mailing list
> khmer at lists.idyll.org
> http://lists.idyll.org/listinfo/khmer

-- 
C. Titus Brown, ctb at msu.edu




More information about the khmer mailing list