[khmer] How to speed up the filter-below-abund script ?

Tue Mar 12 06:41:37 PDT 2013

On Tue, Mar 12, 2013 at 10:48:03AM +0100, Alexis Groppi wrote:
> Metagenome assembly :
> My data :
> - original (quality filtered) data : 4463243 reads (75 nt) (Illumina)
> 1/ Single pass digital normalization with normalize-by-median (C=20)
> ==> file .keep of 2560557 reads
> 2/ generated a hash table by load-into-counting on the .keep file
> ==> file .kh of ~16Go (huge file ?!)
> 3/ filter-below-abund with C=100 from the two previous file (table.kh  
> and reads.keep)
> Still running after 24 hours  :(
>
> Any advice to speed up this step ? ... and the others (partitionning ...) ?
>
> I can have an access to a HPC : ~3000 cores.

Hi Alexis,

filter-below-abund and filter-abund have occasional bugs that prevent them
from completing.  I would kill and restart.  For that few reads it should
take no more than a few hours to do everything.

Most of what khmer does cannot easily be distributed across multiple chassis,
note.

best,
--titus
-- 
C. Titus Brown, ctb at msu.edu