[khmer] parition-graph memory requirements
Jens-Konrad Preem
jpreem at ut.ee
Wed Apr 10 01:23:49 PDT 2013
Hi,
In an extreme act of foolishness I do seem to have lost my error logs.
(I have been messing with the different scripts here a lot and so got
rid of some of the outputs, in some ill thought out "housekeeping" event).
I do attach here a bunch of PBS scripts that I used to get as far as I
am. I did use a different script for most of the normalize and partition
pipeline, so I'd have time to look at the outputs and get a sense of
time taken for each. The scripts are in following order -
supkhme(normalize), suprem(filter-below), supload(load-graph), and
finally supart(partition-graph). (As can be seen I try to do the
meta-genome analysis as per the guide.txt)
All the previous scripts completed without complaint, producing the 5.2
Gb "graafik" graph.
The partition graph had failed a few times after running an hour or so
always with error messages concerning memory. Now the latest script
there demands 240 Gb of memory which is maximum I can demand in the near
future, and still failed with an error message concerning memory.
I am right now working on reproducing the error, so I can then supply
you with .logs and .error files, when no error occurs the better for me
of course.
I decided to try different k-values this time as suggested by
https://khmer.readthedocs.org/en/latest/guide.html (20 for
normalization, and 32 for partitioning) those should make the graph file
all the bigger - I used the smaller ones to avoid running out of memory
but as it doesn't seem to help then what the heck. ;D. Right now I am at
the load-graph stage with the new set. As it will complete in few hours
I'll put the partition-graph on the run and then we will see if it dies
within an hour. If so I'll post a new set of scripts and logs.
Thank you for your time,
Jens-Konrad
On 04/10/2013 04:18 AM, Eric McDonald wrote:
> Hi Jens-Konrad,
>
> Sorry for the delayed response. (I was on vacation yesterday and
> hoping that someone more familiar with the partitioning code would
> answer.)
>
> My understanding of the code is that decreasing the subset size will
> increase the number of partitions but will not change the overall
> graph coverage. Therefore, I would not expect it to lower memory
> requirements. (The overhead from additional partitions might raise
> them some, but I have not analyzed the code deeply enough to say one
> way or another about that.) As far as changing the number of threads
> goes, each thread does seem to maintain a local list of traversed
> k-mers (hidden in the C++ implementation) but I do not yet know how
> much that would impact memory usage. Have you tried using a fewer
> number of threads?
>
> But, rather than guessing about causation, let's try to get some more
> diagnostic information. Does the script die immediately? (How long
> does the PBS job execute before failure?) Can you attach the output
> and error files for a job, and also the job script? What does
> qstat -f <job_id>
> where <job_id> is the ID of your running job, tell you about memory usage?
>
> Thanks,
> Eric
>
>
>
>
> On Mon, Apr 8, 2013 at 3:34 AM, Jens-Konrad Preem <jpreem at ut.ee
> <mailto:jpreem at ut.ee>> wrote:
>
> Hi,
> I am having trouble with completing a partition-graph.py job.
> No matter the configurations It seems to terminate with error
> messages hinting at low memory etc. *
> Does LOWering the subset size reduce the memory use, what about
> LOWering the amount of parallel threads?
> The graafik.ht <http://graafik.ht> is 5.2G large, I had the script
> running as a PBS job with 240 GB RAM allocated. (That's as much as
> I can get it, maybe I'll have an opportunity in the next week to
> double it, but I wouldn't count on it).
> Is it expected for the script to require so much RAM, or is there
> some bug or some misuse by my part. Would there be any
> configuration to get past this?
>
> Jens-Konrad Preem, MSc., University of Tartu
>
>
>
> * the latest configuration after I thought on smaller subset size
> ./khmer/scripts/partition-graph.py --threads 24 --subset-size 1e4
> graafik
> terminated with
> cannot allocate memory for thread-local data: ABORT
>
>
> _______________________________________________
> khmer mailing list
> khmer at lists.idyll.org <mailto:khmer at lists.idyll.org>
> http://lists.idyll.org/listinfo/khmer
>
>
>
>
> --
> Eric McDonald
> HPC/Cloud Software Engineer
> for the Institute for Cyber-Enabled Research (iCER)
> and the Laboratory for Genomics, Evolution, and Development (GED)
> Michigan State University
> P: 517-355-8733
--
Jens-Konrad Preem, MSc, University of Tartu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20130410/feda71fa/attachment-0001.htm>
-------------- next part --------------
#PBS -N filterabund
#T88 vajab 1 masinat ja igayhest 32 tuuma
#PBS -l nodes=1:ppn=24
#T88 vajab 44+2GB m2lu
#PBS -l vmem=46gb
#PBS -l walltime=8:00:00
# T88 alguses ja p2rast l6ppu saadetakse kiri
#PBS -M jpreem at ut.ee
#PBS -m abe
# M22ra t88 kodukataloogigks oma /storage/hpchome/<kasutajanimi> asuv kataloog.
# P2rast 6ige kataloogi sisestamist eemalda rea algusest liigsed #
#PBS -d /gpfs/hpchome/jpreem/norm/
# Kirjuta oma k2sud siia
source activate
./khmer/sandbox/filter-below-abund.py ktabel HFinterleaved.fastq.keep
-------------- next part --------------
#PBS -N load-graph
#T88 vajab 1 masinat ja igayhest 32 tuuma
#PBS -l nodes=1:ppn=24
#T88 vajab 44+2GB m2lu
#PBS -l vmem=46gb
#PBS -l walltime=8:00:00
# T88 alguses ja p2rast l6ppu saadetakse kiri
#PBS -M jpreem at ut.ee
#PBS -m abe
# M22ra t88 kodukataloogigks oma /storage/hpchome/<kasutajanimi> asuv kataloog.
# P2rast 6ige kataloogi sisestamist eemalda rea algusest liigsed #
#PBS -d /gpfs/hpchome/jpreem/norm/
# Kirjuta oma k2sud siia
source activate
./khmer/scripts/load-graph.py graafik -k 17 -N 4 -x 11e9 HFinterleaved.fastq.keep.below
-------------- next part --------------
#PBS -N khmeralgushf
#T88 vajab 1 masinat ja igayhest 32 tuuma
#PBS -l nodes=1:ppn=24
#T88 vajab 44+2GB m2lu
#PBS -l vmem=46gb
#PBS -l walltime=8:00:00
# T88 alguses ja p2rast l6ppu saadetakse kiri
#PBS -M jpreem at ut.ee
#PBS -m abe
# M22ra t88 kodukataloogigks oma /storage/hpchome/<kasutajanimi> asuv kataloog.
# P2rast 6ige kataloogi sisestamist eemalda rea algusest liigsed #
#PBS -d /gpfs/hpchome/jpreem/norm/
# Kirjuta oma k2sud siia
source activate
./khmer/scripts/normalize-by-median.py -k 17 -s ktabel -N 4 -x 11e9 -p -C=20 HFinterleaved.fastq
-------------- next part --------------
#PBS -N partition-graph
#T88 vajab 1 masinat ja igayhest 32 tuuma
#PBS -l nodes=1:ppn=24
#T88 vajab 44+2GB m2lu
#PBS -l vmem=240gb
#PBS -l walltime=8:00:00
# T88 alguses ja p2rast l6ppu saadetakse kiri
#PBS -M jpreem at ut.ee
#PBS -m abe
# M22ra t88 kodukataloogigks oma /storage/hpchome/<kasutajanimi> asuv kataloog.
# P2rast 6ige kataloogi sisestamist eemalda rea algusest liigsed #
#PBS -d /gpfs/hpchome/jpreem/norm/
# Kirjuta oma k2sud siia
source activate
./khmer/scripts/partition-graph.py --threads 24 --subset-size 1e4 graafik
More information about the khmer
mailing list