[khmer] Fwd: parition-graph memory requirements

Sat Apr 13 01:16:50 PDT 2013

Yes the steady ballooning is quite obvious, espescially if I take some 
time staring at top command output etc. Thank you for your time I will 
then hope that someone will look at this stuff here. As a note might it 
be that my graafik.ht is corrupted somehow or something? It is even 
smaller by size than the 50m.ht which I was nicely able to partition, as 
additional information to anybody interested the data used was ~36M 250 
bp reads.
Jens-Konrad
On 04/13/2013 05:35 AM, Eric McDonald wrote:
> Jens-Konrad,
>
> Thanks for providing this information.
>  15: resources_used.mem = 52379536kb
> 30: resources_used.mem = 90676068kb
> 45: resources_used.mem = 122543188kb
> Definitely some ballooning memory use there.
>
> One more thing you may wish to examine from the command line is:
>   qmgr -c "l s" | grep 'resources_'
> This will tell you about any default resources (such as physical 
> memory) that your PBS server is assigning to new jobs. That said, I do 
> believe that your jobs are exhausting available memory.
> So, now the question is whether anything can be done about it. Unless 
> someone with more experience with the partitioning code decides to 
> speak up, I am going to have analyze your chosen parameters and the 
> pieces of code in question to see if I can deduce anything. I might 
> not be able to do this until Monday - I am too tired to do it tonight 
> (here in US Eastern time) and have a busy weekend ahead of me.
>
> I promise I will get back to you with some better answers if no one 
> else decides to say anything. While you are waiting for a response and 
> if you want to test your hypothesis about the number of threads 
> correlating to increased memory use, then I would recommend using a 
> smaller data set and seeing what kind of scaling in the memory use you 
> see as you change the number of threads.
>
> Have a good weekend,
>   Eric
>
>
>
> On Fri, Apr 12, 2013 at 7:30 AM, Jens-Konrad Preem <jpreem at ut.ee 
> <mailto:jpreem at ut.ee>> wrote:
>
>     On 04/11/2013 02:58 AM, Eric McDonald wrote:
>>     Forgot to reply to all, in case the answer will help anyone else
>>     on the list....
>>
>>     ---------- Forwarded message ----------
>>     From: *Eric McDonald* <emcd.msu at gmail.com
>>     <mailto:emcd.msu at gmail.com>>
>>     Date: Wed, Apr 10, 2013 at 7:57 PM
>>     Subject: Re: [khmer] parition-graph memory requirements
>>     To: Jens-Konrad Preem <jpreem at ut.ee <mailto:jpreem at ut.ee>>
>>
>>
>>     Hi,
>>
>>     Sorry for the delayed reply.
>>
>>     Thanks for sharing your job scripts. I notice that you are
>>     specifying the 'vmem' resource. However, if PBS is also enforcing
>>     a limit on the 'mem' resource (physical memory), then you may be
>>     encountering that limit. Do you know what default value is
>>     assigned by your site's PBS server for the 'mem' resource?
>>
>>     Again, if you run:
>>       qstat -f <job_id>
>>     you should be able to determine both the resources allocated for
>>     the job and how much the job is actually using. Please let us
>>     know the results of this command, if you would like help
>>     interpreting them and figuring out how to change your PBS
>>     resource request, if necessary.
>>
>>     As a side note, smaller k-mer lengths mean that more k-mers are
>>     being extracted from each sequence. This means that the hash
>>     tables are being more densely populated. And, that means that you
>>     are more likely to need larger hash tables to avoid a significant
>>     false positive rate. But, I think a better thing to say is that
>>     the amount of memory used by the hash tables is independent of
>>     k-mer size. So, changing k-mer length does not affect memory
>>     usage for many parts of khmer. (I would have to look more closely
>>     to see how this affects the partitioning code.)
>>
>>     Hope that helps,
>>       Eric
>>
>>
>>
>>     On Wed, Apr 10, 2013 at 4:23 AM, Jens-Konrad Preem <jpreem at ut.ee
>>     <mailto:jpreem at ut.ee>> wrote:
>>
>>         Hi,
>>
>>         In an extreme act of foolishness I do seem to have lost my
>>         error logs. (I have been messing with the different  scripts 
>>         here a lot and so got rid of some of the outputs, in some ill
>>         thought out "housekeeping" event).
>>
>>         I do attach here a bunch of PBS scripts that I used to get as
>>         far as I am. I did use a different script for most of the
>>         normalize and partition pipeline, so I'd have time to look at
>>         the outputs and get a sense of time taken for each. The
>>         scripts are in following order - supkhme(normalize),
>>         suprem(filter-below), supload(load-graph), and finally
>>         supart(partition-graph). (As can be seen I try to do the
>>         meta-genome analysis as per the guide.txt)
>>         All the previous scripts completed without complaint,
>>         producing the 5.2 Gb "graafik" graph.
>>
>>         The partition graph had failed a few times after running an
>>         hour or so always with error messages concerning memory. Now
>>         the latest script there demands 240 Gb of memory which is
>>         maximum I can demand in the near future, and still failed
>>         with an error message concerning memory.
>>
>>         I am right now working on reproducing the error, so I can
>>         then supply you with .logs and .error files, when no error
>>         occurs the better for me of course.
>>         I decided to try different k-values this time as suggested by
>>         https://khmer.readthedocs.org/en/latest/guide.html (20 for
>>         normalization, and 32 for partitioning) those should make the
>>         graph file all the bigger - I used the smaller ones to avoid
>>         running out of memory but as it doesn't seem to help then
>>         what the heck. ;D. Right now I am at the load-graph stage
>>         with the new set. As it will complete in few hours I'll put
>>         the partition-graph on the run and then we will see if it
>>         dies within an hour. If so I'll post a new set of scripts and
>>         logs.
>>
>>         Thank you for your time,
>>         Jens-Konrad
>>
>>
>>
>>
>>         On 04/10/2013 04:18 AM, Eric McDonald wrote:
>>>         Hi Jens-Konrad,
>>>
>>>         Sorry for the delayed response. (I was on vacation yesterday
>>>         and hoping that someone more familiar with the partitioning
>>>         code would answer.)
>>>
>>>         My understanding of the code is that decreasing the subset
>>>         size will increase the number of partitions but will not
>>>         change the overall graph coverage. Therefore, I would not
>>>         expect it to lower memory requirements. (The overhead from
>>>         additional partitions might raise them some, but I have not
>>>         analyzed the code deeply enough to say one way or another
>>>         about that.) As far as changing the number of threads goes,
>>>         each thread does seem to maintain a local list of traversed
>>>         k-mers (hidden in the C++ implementation) but I do not yet
>>>         know how much that would impact memory usage. Have you tried
>>>         using a fewer number of threads?
>>>
>>>         But, rather than guessing about causation, let's try to get
>>>         some more diagnostic information. Does the script die
>>>         immediately? (How long does the PBS job execute before
>>>         failure?) Can you attach the output and error files for a
>>>         job, and also the job script? What does
>>>           qstat -f <job_id>
>>>         where <job_id> is the ID of your running job, tell you about
>>>         memory usage?
>>>
>>>         Thanks,
>>>           Eric
>>>
>>>
>>>
>>>
>>>         On Mon, Apr 8, 2013 at 3:34 AM, Jens-Konrad Preem
>>>         <jpreem at ut.ee <mailto:jpreem at ut.ee>> wrote:
>>>
>>>             Hi,
>>>             I am having trouble with completing a partition-graph.py
>>>             job.
>>>             No matter the configurations It seems to terminate with
>>>             error messages hinting at low memory etc. *
>>>             Does LOWering the subset size reduce the memory use,
>>>             what about LOWering the amount of parallel threads?
>>>             The graafik.ht <http://graafik.ht> is 5.2G large, I had
>>>             the script running as a PBS job with 240 GB RAM
>>>             allocated. (That's as much as I can get it, maybe I'll
>>>             have an opportunity in the next week to double it, but I
>>>             wouldn't count on it).
>>>             Is it expected for the script to require so much RAM, or
>>>             is there some bug or some misuse by my part. Would there
>>>             be any configuration to get past this?
>>>
>>>             Jens-Konrad Preem, MSc., University of Tartu
>>>
>>>
>>>
>>>             * the latest configuration after I thought on smaller
>>>             subset size
>>>             ./khmer/scripts/partition-graph.py  --threads 24
>>>             --subset-size 1e4 graafik
>>>             terminated with
>>>             cannot allocate memory for thread-local data: ABORT
>>>
>>>
>>>             _______________________________________________
>>>             khmer mailing list
>>>             khmer at lists.idyll.org <mailto:khmer at lists.idyll.org>
>>>             http://lists.idyll.org/listinfo/khmer
>>>
>>>
>>>
>>>
>>>         -- 
>>>         Eric McDonald
>>>         HPC/Cloud Software Engineer
>>>           for the Institute for Cyber-Enabled Research (iCER)
>>>           and the Laboratory for Genomics, Evolution, and
>>>         Development (GED)
>>>         Michigan State University
>>>         P: 517-355-8733 <tel:517-355-8733>
>>
>>         -- 
>>         Jens-Konrad Preem, MSc, University of Tartu
>>
>>
>>         _______________________________________________
>>         khmer mailing list
>>         khmer at lists.idyll.org <mailto:khmer at lists.idyll.org>
>>         http://lists.idyll.org/listinfo/khmer
>>
>>
>>
>>
>>     -- 
>>     Eric McDonald
>>     HPC/Cloud Software Engineer
>>       for the Institute for Cyber-Enabled Research (iCER)
>>       and the Laboratory for Genomics, Evolution, and Development (GED)
>>     Michigan State University
>>     P: 517-355-8733 <tel:517-355-8733>
>>
>>
>>
>>     -- 
>>     Eric McDonald
>>     HPC/Cloud Software Engineer
>>       for the Institute for Cyber-Enabled Research (iCER)
>>       and the Laboratory for Genomics, Evolution, and Development (GED)
>>     Michigan State University
>>     P: 517-355-8733 <tel:517-355-8733>
>>
>>
>>     _______________________________________________
>>     khmer mailing list
>>     khmer at lists.idyll.org  <mailto:khmer at lists.idyll.org>
>>     http://lists.idyll.org/listinfo/khmer
>     OK.
>     I post a failed run complete with PBS script, error log., and
>     qstat-f snapshots at different times.
>     I find it weird that I managed to complete the test run on
>     iowa-corn50M which had a graph file even larger. Might the number
>     of used threads pump up the memory? I used the sample commands
>     from the web-page for corn. These used 4 threads at max.
>     Jens-Konrad Preem
>
>     _______________________________________________
>     khmer mailing list
>     khmer at lists.idyll.org <mailto:khmer at lists.idyll.org>
>     http://lists.idyll.org/listinfo/khmer
>
>
>
>
> -- 
> Eric McDonald
> HPC/Cloud Software Engineer
>   for the Institute for Cyber-Enabled Research (iCER)
>   and the Laboratory for Genomics, Evolution, and Development (GED)
> Michigan State University
> P: 517-355-8733

-- 
Jens-Konrad Preem, MSc, University of Tartu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20130413/114a6a50/attachment.htm>