[khmer] speed up partitioning?

C. Titus Brown ctb at msu.edu
Tue Feb 18 19:39:09 PST 2014


On Wed, Feb 12, 2014 at 11:54:49AM -0800, Cedar McKay wrote:
> Hello, thanks for making this software available, it's a big help.
> I'm hoping you will be able to offer advice on how I might choose different settings to accelerate do-partition.py. It's been running for a month now, and that is just too slow for my needs.

Hi Cedar,

agreed :).

Have you seen:

https://khmer-protocols.readthedocs.org/en/v0.8.4/

?  The filter-below-abund code in the metagenome protocol should speed things
up for you dramatically...

> What I have:
> Environmental sample of 68 million paired, interleaved illumina reads. Each is about 150nt long. Expected diversity is high.
> The computer I'm using has 48 cores (AMD 6176 2.3Ghz) and 256GB ram.
> 
> The problem:
> My problem is that it's been running for about a month now, using 90% of memory, and 4600% cpu. In other words, it appears to be using available resources. If I read the output (pasted below) correctly, I can expect to see 6566 partitions. After a month, it is working on partition 2178, so I'm only 1/3 of the way through. 

Yep.

> My questions:
> Given my dataset and computer, are the parameters I chose reasonable?

I'm sort of astonished it's taking that long... it means you probably have
some high coverage or very repetitive "stuff" in there.

> What effect does kmer size have on speed and sensitivity?

We've never tested this.

> What is the practical effect of varying subset-size?

It results in an increase in parallelism towards the ends of the 

You might also turn on stop_big_traversals, --no-big-traverse.

best,
--titus




More information about the khmer mailing list