[khmer] How to speed up the filter-below-abund script ?

C. Titus Brown ctb at msu.edu
Tue Mar 12 08:16:35 PDT 2013


On Tue, Mar 12, 2013 at 04:15:05PM +0100, Alexis Groppi wrote:
> Hi Titus,
>
> Thanks for your answer
> Actually it's my second attempt with filter-below-abund.
> The first time, I thought the problem was coming from the location of my  
> table.kh file : in a storage element with poor level performance of I/O
> I killed the job after 24h, moved the file in a best place and re run it
> But with the same result : no completion after 24h
>
> Any Idea ?
>
> Thanks
>
> Cheers From Bordeaux :)
>
> Alexis
>
> PS : The command line was the following :
>
> ./filter-below-abund.py 174r1_table.kh 174r1_prinseq_good_bFr8.fasta.keep
>
> Is this correct ?

Yes, looks right... Can you try with the bleeding-edge branch, which now
incorporates a potential fix for this issue?

thanks,
--titus

> Le 12/03/2013 14:41, C. Titus Brown a ?crit :
>> On Tue, Mar 12, 2013 at 10:48:03AM +0100, Alexis Groppi wrote:
>>> Metagenome assembly :
>>> My data :
>>> - original (quality filtered) data : 4463243 reads (75 nt) (Illumina)
>>> 1/ Single pass digital normalization with normalize-by-median (C=20)
>>> ==> file .keep of 2560557 reads
>>> 2/ generated a hash table by load-into-counting on the .keep file
>>> ==> file .kh of ~16Go (huge file ?!)
>>> 3/ filter-below-abund with C=100 from the two previous file (table.kh
>>> and reads.keep)
>>> Still running after 24 hours  :(
>>>
>>> Any advice to speed up this step ? ... and the others (partitionning ...) ?
>>>
>>> I can have an access to a HPC : ~3000 cores.
>> Hi Alexis,
>>
>> filter-below-abund and filter-abund have occasional bugs that prevent them
>> from completing.  I would kill and restart.  For that few reads it should
>> take no more than a few hours to do everything.
>>
>> Most of what khmer does cannot easily be distributed across multiple chassis,
>> note.
>>
>> best,
>> --titus
>
> -- 

-- 
C. Titus Brown, ctb at msu.edu




More information about the khmer mailing list