[khmer] speed up partitioning?
C. Titus Brown
ctb at msu.edu
Tue Feb 18 19:39:09 PST 2014
On Wed, Feb 12, 2014 at 11:54:49AM -0800, Cedar McKay wrote:
> Hello, thanks for making this software available, it's a big help.
> I'm hoping you will be able to offer advice on how I might choose different settings to accelerate do-partition.py. It's been running for a month now, and that is just too slow for my needs.
Have you seen:
? The filter-below-abund code in the metagenome protocol should speed things
up for you dramatically...
> What I have:
> Environmental sample of 68 million paired, interleaved illumina reads. Each is about 150nt long. Expected diversity is high.
> The computer I'm using has 48 cores (AMD 6176 2.3Ghz) and 256GB ram.
> The problem:
> My problem is that it's been running for about a month now, using 90% of memory, and 4600% cpu. In other words, it appears to be using available resources. If I read the output (pasted below) correctly, I can expect to see 6566 partitions. After a month, it is working on partition 2178, so I'm only 1/3 of the way through.
> My questions:
> Given my dataset and computer, are the parameters I chose reasonable?
I'm sort of astonished it's taking that long... it means you probably have
some high coverage or very repetitive "stuff" in there.
> What effect does kmer size have on speed and sensitivity?
We've never tested this.
> What is the practical effect of varying subset-size?
It results in an increase in parallelism towards the ends of the
You might also turn on stop_big_traversals, --no-big-traverse.
More information about the khmer