[khmer] Fwd: parition-graph memory requirements
Jens-Konrad Preem
jpreem at ut.ee
Sat Apr 13 01:16:50 PDT 2013
Yes the steady ballooning is quite obvious, espescially if I take some
time staring at top command output etc. Thank you for your time I will
then hope that someone will look at this stuff here. As a note might it
be that my graafik.ht is corrupted somehow or something? It is even
smaller by size than the 50m.ht which I was nicely able to partition, as
additional information to anybody interested the data used was ~36M 250
bp reads.
Jens-Konrad
On 04/13/2013 05:35 AM, Eric McDonald wrote:
> Jens-Konrad,
>
> Thanks for providing this information.
> 15: resources_used.mem = 52379536kb
> 30: resources_used.mem = 90676068kb
> 45: resources_used.mem = 122543188kb
> Definitely some ballooning memory use there.
>
> One more thing you may wish to examine from the command line is:
> qmgr -c "l s" | grep 'resources_'
> This will tell you about any default resources (such as physical
> memory) that your PBS server is assigning to new jobs. That said, I do
> believe that your jobs are exhausting available memory.
> So, now the question is whether anything can be done about it. Unless
> someone with more experience with the partitioning code decides to
> speak up, I am going to have analyze your chosen parameters and the
> pieces of code in question to see if I can deduce anything. I might
> not be able to do this until Monday - I am too tired to do it tonight
> (here in US Eastern time) and have a busy weekend ahead of me.
>
> I promise I will get back to you with some better answers if no one
> else decides to say anything. While you are waiting for a response and
> if you want to test your hypothesis about the number of threads
> correlating to increased memory use, then I would recommend using a
> smaller data set and seeing what kind of scaling in the memory use you
> see as you change the number of threads.
>
> Have a good weekend,
> Eric
>
>
>
> On Fri, Apr 12, 2013 at 7:30 AM, Jens-Konrad Preem <jpreem at ut.ee
> <mailto:jpreem at ut.ee>> wrote:
>
> On 04/11/2013 02:58 AM, Eric McDonald wrote:
>> Forgot to reply to all, in case the answer will help anyone else
>> on the list....
>>
>> ---------- Forwarded message ----------
>> From: *Eric McDonald* <emcd.msu at gmail.com
>> <mailto:emcd.msu at gmail.com>>
>> Date: Wed, Apr 10, 2013 at 7:57 PM
>> Subject: Re: [khmer] parition-graph memory requirements
>> To: Jens-Konrad Preem <jpreem at ut.ee <mailto:jpreem at ut.ee>>
>>
>>
>> Hi,
>>
>> Sorry for the delayed reply.
>>
>> Thanks for sharing your job scripts. I notice that you are
>> specifying the 'vmem' resource. However, if PBS is also enforcing
>> a limit on the 'mem' resource (physical memory), then you may be
>> encountering that limit. Do you know what default value is
>> assigned by your site's PBS server for the 'mem' resource?
>>
>> Again, if you run:
>> qstat -f <job_id>
>> you should be able to determine both the resources allocated for
>> the job and how much the job is actually using. Please let us
>> know the results of this command, if you would like help
>> interpreting them and figuring out how to change your PBS
>> resource request, if necessary.
>>
>> As a side note, smaller k-mer lengths mean that more k-mers are
>> being extracted from each sequence. This means that the hash
>> tables are being more densely populated. And, that means that you
>> are more likely to need larger hash tables to avoid a significant
>> false positive rate. But, I think a better thing to say is that
>> the amount of memory used by the hash tables is independent of
>> k-mer size. So, changing k-mer length does not affect memory
>> usage for many parts of khmer. (I would have to look more closely
>> to see how this affects the partitioning code.)
>>
>> Hope that helps,
>> Eric
>>
>>
>>
>> On Wed, Apr 10, 2013 at 4:23 AM, Jens-Konrad Preem <jpreem at ut.ee
>> <mailto:jpreem at ut.ee>> wrote:
>>
>> Hi,
>>
>> In an extreme act of foolishness I do seem to have lost my
>> error logs. (I have been messing with the different scripts
>> here a lot and so got rid of some of the outputs, in some ill
>> thought out "housekeeping" event).
>>
>> I do attach here a bunch of PBS scripts that I used to get as
>> far as I am. I did use a different script for most of the
>> normalize and partition pipeline, so I'd have time to look at
>> the outputs and get a sense of time taken for each. The
>> scripts are in following order - supkhme(normalize),
>> suprem(filter-below), supload(load-graph), and finally
>> supart(partition-graph). (As can be seen I try to do the
>> meta-genome analysis as per the guide.txt)
>> All the previous scripts completed without complaint,
>> producing the 5.2 Gb "graafik" graph.
>>
>> The partition graph had failed a few times after running an
>> hour or so always with error messages concerning memory. Now
>> the latest script there demands 240 Gb of memory which is
>> maximum I can demand in the near future, and still failed
>> with an error message concerning memory.
>>
>> I am right now working on reproducing the error, so I can
>> then supply you with .logs and .error files, when no error
>> occurs the better for me of course.
>> I decided to try different k-values this time as suggested by
>> https://khmer.readthedocs.org/en/latest/guide.html (20 for
>> normalization, and 32 for partitioning) those should make the
>> graph file all the bigger - I used the smaller ones to avoid
>> running out of memory but as it doesn't seem to help then
>> what the heck. ;D. Right now I am at the load-graph stage
>> with the new set. As it will complete in few hours I'll put
>> the partition-graph on the run and then we will see if it
>> dies within an hour. If so I'll post a new set of scripts and
>> logs.
>>
>> Thank you for your time,
>> Jens-Konrad
>>
>>
>>
>>
>> On 04/10/2013 04:18 AM, Eric McDonald wrote:
>>> Hi Jens-Konrad,
>>>
>>> Sorry for the delayed response. (I was on vacation yesterday
>>> and hoping that someone more familiar with the partitioning
>>> code would answer.)
>>>
>>> My understanding of the code is that decreasing the subset
>>> size will increase the number of partitions but will not
>>> change the overall graph coverage. Therefore, I would not
>>> expect it to lower memory requirements. (The overhead from
>>> additional partitions might raise them some, but I have not
>>> analyzed the code deeply enough to say one way or another
>>> about that.) As far as changing the number of threads goes,
>>> each thread does seem to maintain a local list of traversed
>>> k-mers (hidden in the C++ implementation) but I do not yet
>>> know how much that would impact memory usage. Have you tried
>>> using a fewer number of threads?
>>>
>>> But, rather than guessing about causation, let's try to get
>>> some more diagnostic information. Does the script die
>>> immediately? (How long does the PBS job execute before
>>> failure?) Can you attach the output and error files for a
>>> job, and also the job script? What does
>>> qstat -f <job_id>
>>> where <job_id> is the ID of your running job, tell you about
>>> memory usage?
>>>
>>> Thanks,
>>> Eric
>>>
>>>
>>>
>>>
>>> On Mon, Apr 8, 2013 at 3:34 AM, Jens-Konrad Preem
>>> <jpreem at ut.ee <mailto:jpreem at ut.ee>> wrote:
>>>
>>> Hi,
>>> I am having trouble with completing a partition-graph.py
>>> job.
>>> No matter the configurations It seems to terminate with
>>> error messages hinting at low memory etc. *
>>> Does LOWering the subset size reduce the memory use,
>>> what about LOWering the amount of parallel threads?
>>> The graafik.ht <http://graafik.ht> is 5.2G large, I had
>>> the script running as a PBS job with 240 GB RAM
>>> allocated. (That's as much as I can get it, maybe I'll
>>> have an opportunity in the next week to double it, but I
>>> wouldn't count on it).
>>> Is it expected for the script to require so much RAM, or
>>> is there some bug or some misuse by my part. Would there
>>> be any configuration to get past this?
>>>
>>> Jens-Konrad Preem, MSc., University of Tartu
>>>
>>>
>>>
>>> * the latest configuration after I thought on smaller
>>> subset size
>>> ./khmer/scripts/partition-graph.py --threads 24
>>> --subset-size 1e4 graafik
>>> terminated with
>>> cannot allocate memory for thread-local data: ABORT
>>>
>>>
>>> _______________________________________________
>>> khmer mailing list
>>> khmer at lists.idyll.org <mailto:khmer at lists.idyll.org>
>>> http://lists.idyll.org/listinfo/khmer
>>>
>>>
>>>
>>>
>>> --
>>> Eric McDonald
>>> HPC/Cloud Software Engineer
>>> for the Institute for Cyber-Enabled Research (iCER)
>>> and the Laboratory for Genomics, Evolution, and
>>> Development (GED)
>>> Michigan State University
>>> P: 517-355-8733 <tel:517-355-8733>
>>
>> --
>> Jens-Konrad Preem, MSc, University of Tartu
>>
>>
>> _______________________________________________
>> khmer mailing list
>> khmer at lists.idyll.org <mailto:khmer at lists.idyll.org>
>> http://lists.idyll.org/listinfo/khmer
>>
>>
>>
>>
>> --
>> Eric McDonald
>> HPC/Cloud Software Engineer
>> for the Institute for Cyber-Enabled Research (iCER)
>> and the Laboratory for Genomics, Evolution, and Development (GED)
>> Michigan State University
>> P: 517-355-8733 <tel:517-355-8733>
>>
>>
>>
>> --
>> Eric McDonald
>> HPC/Cloud Software Engineer
>> for the Institute for Cyber-Enabled Research (iCER)
>> and the Laboratory for Genomics, Evolution, and Development (GED)
>> Michigan State University
>> P: 517-355-8733 <tel:517-355-8733>
>>
>>
>> _______________________________________________
>> khmer mailing list
>> khmer at lists.idyll.org <mailto:khmer at lists.idyll.org>
>> http://lists.idyll.org/listinfo/khmer
> OK.
> I post a failed run complete with PBS script, error log., and
> qstat-f snapshots at different times.
> I find it weird that I managed to complete the test run on
> iowa-corn50M which had a graph file even larger. Might the number
> of used threads pump up the memory? I used the sample commands
> from the web-page for corn. These used 4 threads at max.
> Jens-Konrad Preem
>
> _______________________________________________
> khmer mailing list
> khmer at lists.idyll.org <mailto:khmer at lists.idyll.org>
> http://lists.idyll.org/listinfo/khmer
>
>
>
>
> --
> Eric McDonald
> HPC/Cloud Software Engineer
> for the Institute for Cyber-Enabled Research (iCER)
> and the Laboratory for Genomics, Evolution, and Development (GED)
> Michigan State University
> P: 517-355-8733
--
Jens-Konrad Preem, MSc, University of Tartu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20130413/114a6a50/attachment.htm>
More information about the khmer
mailing list