[khmer] exceeding defined RAM limits?

Oh, Julia (NIH/NHGRI) [F] julia.oh at nih.gov
Tue Dec 17 11:53:18 PST 2013

Hi all,

Hopefully a simple error on my end that’s causing a memory error: 

Starting with a fairly large file (estimated ~872400000 reads, ~185GB Illumina data): 

I’m running the following command on a large memory machine. From what I understand, the first normalization step should be consuming 240GB RAM and it does:

$python2.7 /home/ohjs/khmer/scripts/normalize-by-median.py -C 20 -k 20 -N 4 -x 60e9 --savehash round2.unaligned_ref.kh -R round2.unaligned_1.report round2.unaligned; 

Seems to end on removing ~33% of the reads, making ~118GB of sequence data

tail round2.unaligned_1.report
871500000 584890641 0.67113097074
871600000 584966095 0.671140540385
871700000 585039359 0.671147595503
871800000 585109434 0.671150991053
871900000 585174062 0.671148138548
872000000 585244067 0.671151452982
872100000 585314163 0.671154871001
872200000 585388191 0.671162796377
872300000 585459804 0.671167951393
872400000 585529439 0.671170837918

Then I do the filtering step which seems to run OK and makes the file a lot smaller, to about 54GB data. 
$python2.7 /home/ohjs/khmer/scripts/filter-abund.py round2.unaligned_ref.kh round2.unaligned.keep; 

Then I have a second normalization step:

$python2.7 /home/ohjs/khmer/scripts/normalize-by-median.py -C 5 -k 20 -N 4 -x 16e9 round2.unaligned.keep.abundfilt; 

I thought I would be maxing out at 64 GB ram for the hash table (I’ve also used 32e9), but I get the following RAM usage report of 

4986693.biobos elapsed time:        23358 seconds
4986693.biobos walltime:         06:28:36 hh:mm:ss
4986693.biobos memory limit:       249.00 GB
4986693.biobos memory used:        249.76 GB
4986693.biobos cpupercent used:     98.00 %

around read 299200000, and then my job gets killed for exceeding memory allocation. 

Any suggestions to address this? I’m happy to provide any clarification. 


More information about the khmer mailing list