[khmer] less reads but more kmers?
C. Titus Brown
ctb at msu.edu
Sat Jan 18 05:00:03 PST 2014
On Fri, Jan 17, 2014 at 09:31:32PM -0200, Nacho Caballero wrote:
> I used khmer to digitally normalize two assemblies:
>
> - After normalization, Assembly A has *1.5 million reads*, and during
> assembly SPAdes uses *116 million* kmers (k=37)
> - After normalization, Assembly B has *1.5 million reads*, during
> assembly SPAdes uses *612 million* kmers (k=37)
>
> I followed the same protocol on both assemblies (quality filtering with
> Trimmomatic, 3-pass normalization, etc.), so I don???t understand why
> assembly B, with 16x fewer reads, has 8x more kmers than assembly A.
>
> What are some possible explanations?
Barring some extraordinarily bizarre bug, the answer *must* be SPAdes
is *choosing to use* more k-mers... I'll ask the SPAdes authors ;)
If you want to check the total number of k-mers, we have some scripts
in khmer to do that. See 'abundance-dist-single.py' here,
http://khmer.readthedocs.org/en/latest/scripts.html#scripts-counting
cheers,
--titus
--
C. Titus Brown, ctb at msu.edu
More information about the khmer
mailing list