[khmer] Partitioning question

Thu Aug 1 05:28:50 PDT 2013

[ redirecting discussion to the khmer at lists.idyll.org list -- see

http://lists.idyll.org/listinfo/khmer ]

On Wed, Jul 31, 2013 at 04:00:18PM -0700, Bill Nelson wrote:
> I have a relatively large dataset (~300M Illumina reads) from a relatively
> simple community (~18 organisms). I am trying to partition the data to see
> if my assemblies improve.
> 
> When I run partition-graph.py, the resulting pmap files always have
> approximately the same size. Is that expected behavior when I know the
> organisms are present at very different abundances?

Hi bill,

short version: yes.  The pmaps are actually created to evenly divide all
of the k-mers (well, it's a bit more complicated, but that's the basic idea)
so the number/size will correlate with overall diversity.

Also, for so few reads, I would expect digital normalization to give you good
results; partitioning is probably overkill (although there are other reasons
why it's not a bad idea).

cheers,
--titus
-- 
C. Titus Brown, ctb at msu.edu