[khmer] Fwd: parition-graph memory requirements
C. Titus Brown
ctb at msu.edu
Fri Apr 26 20:56:02 PDT 2013
Hi Jens-Konrad,
apologies for late response. It's been quite a month.
partition-graph *really* shouldn't be doing that; the remaining memory
ballooning script in there is find-knots, not partition-graph :). I
looked at your scripts and didn't see anything obviously problematic,
but then again this shouldn't be happening at all.
Could you try with --no-big-traverse and tell me what happens?
I don't suppose you can share a problematic data set with me?
thanks,
--titus
On Sat, Apr 13, 2013 at 11:16:50AM +0300, Jens-Konrad Preem wrote:
> Yes the steady ballooning is quite obvious, espescially if I take some
> time staring at top command output etc. Thank you for your time I will
> then hope that someone will look at this stuff here. As a note might it
> be that my graafik.ht is corrupted somehow or something? It is even
> smaller by size than the 50m.ht which I was nicely able to partition, as
> additional information to anybody interested the data used was ~36M 250
> bp reads.
> Jens-Konrad
> On 04/13/2013 05:35 AM, Eric McDonald wrote:
>> Jens-Konrad,
>>
>> Thanks for providing this information.
>> 15: resources_used.mem = 52379536kb
>> 30: resources_used.mem = 90676068kb
>> 45: resources_used.mem = 122543188kb
>> Definitely some ballooning memory use there.
>>
>> One more thing you may wish to examine from the command line is:
>> qmgr -c "l s" | grep 'resources_'
>> This will tell you about any default resources (such as physical
>> memory) that your PBS server is assigning to new jobs. That said, I do
>> believe that your jobs are exhausting available memory.
>> So, now the question is whether anything can be done about it. Unless
>> someone with more experience with the partitioning code decides to
>> speak up, I am going to have analyze your chosen parameters and the
>> pieces of code in question to see if I can deduce anything. I might
>> not be able to do this until Monday - I am too tired to do it tonight
>> (here in US Eastern time) and have a busy weekend ahead of me.
>>
>> I promise I will get back to you with some better answers if no one
>> else decides to say anything. While you are waiting for a response and
>> if you want to test your hypothesis about the number of threads
>> correlating to increased memory use, then I would recommend using a
>> smaller data set and seeing what kind of scaling in the memory use you
>> see as you change the number of threads.
>>
>> Have a good weekend,
>> Eric
>>
>>
>>
>> On Fri, Apr 12, 2013 at 7:30 AM, Jens-Konrad Preem <jpreem at ut.ee
>> <mailto:jpreem at ut.ee>> wrote:
>>
>> On 04/11/2013 02:58 AM, Eric McDonald wrote:
>>> Forgot to reply to all, in case the answer will help anyone else
>>> on the list....
>>>
>>> ---------- Forwarded message ----------
>>> From: *Eric McDonald* <emcd.msu at gmail.com
>>> <mailto:emcd.msu at gmail.com>>
>>> Date: Wed, Apr 10, 2013 at 7:57 PM
>>> Subject: Re: [khmer] parition-graph memory requirements
>>> To: Jens-Konrad Preem <jpreem at ut.ee <mailto:jpreem at ut.ee>>
>>>
>>>
>>> Hi,
>>>
>>> Sorry for the delayed reply.
>>>
>>> Thanks for sharing your job scripts. I notice that you are
>>> specifying the 'vmem' resource. However, if PBS is also enforcing
>>> a limit on the 'mem' resource (physical memory), then you may be
>>> encountering that limit. Do you know what default value is
>>> assigned by your site's PBS server for the 'mem' resource?
>>>
>>> Again, if you run:
>>> qstat -f <job_id>
>>> you should be able to determine both the resources allocated for
>>> the job and how much the job is actually using. Please let us
>>> know the results of this command, if you would like help
>>> interpreting them and figuring out how to change your PBS
>>> resource request, if necessary.
>>>
>>> As a side note, smaller k-mer lengths mean that more k-mers are
>>> being extracted from each sequence. This means that the hash
>>> tables are being more densely populated. And, that means that you
>>> are more likely to need larger hash tables to avoid a significant
>>> false positive rate. But, I think a better thing to say is that
>>> the amount of memory used by the hash tables is independent of
>>> k-mer size. So, changing k-mer length does not affect memory
>>> usage for many parts of khmer. (I would have to look more closely
>>> to see how this affects the partitioning code.)
>>>
>>> Hope that helps,
>>> Eric
>>>
>>>
>>>
>>> On Wed, Apr 10, 2013 at 4:23 AM, Jens-Konrad Preem <jpreem at ut.ee
>>> <mailto:jpreem at ut.ee>> wrote:
>>>
>>> Hi,
>>>
>>> In an extreme act of foolishness I do seem to have lost my
>>> error logs. (I have been messing with the different scripts
>>> here a lot and so got rid of some of the outputs, in some ill
>>> thought out "housekeeping" event).
>>>
>>> I do attach here a bunch of PBS scripts that I used to get as
>>> far as I am. I did use a different script for most of the
>>> normalize and partition pipeline, so I'd have time to look at
>>> the outputs and get a sense of time taken for each. The
>>> scripts are in following order - supkhme(normalize),
>>> suprem(filter-below), supload(load-graph), and finally
>>> supart(partition-graph). (As can be seen I try to do the
>>> meta-genome analysis as per the guide.txt)
>>> All the previous scripts completed without complaint,
>>> producing the 5.2 Gb "graafik" graph.
>>>
>>> The partition graph had failed a few times after running an
>>> hour or so always with error messages concerning memory. Now
>>> the latest script there demands 240 Gb of memory which is
>>> maximum I can demand in the near future, and still failed
>>> with an error message concerning memory.
>>>
>>> I am right now working on reproducing the error, so I can
>>> then supply you with .logs and .error files, when no error
>>> occurs the better for me of course.
>>> I decided to try different k-values this time as suggested by
>>> https://khmer.readthedocs.org/en/latest/guide.html (20 for
>>> normalization, and 32 for partitioning) those should make the
>>> graph file all the bigger - I used the smaller ones to avoid
>>> running out of memory but as it doesn't seem to help then
>>> what the heck. ;D. Right now I am at the load-graph stage
>>> with the new set. As it will complete in few hours I'll put
>>> the partition-graph on the run and then we will see if it
>>> dies within an hour. If so I'll post a new set of scripts and
>>> logs.
>>>
>>> Thank you for your time,
>>> Jens-Konrad
>>>
>>>
>>>
>>>
>>> On 04/10/2013 04:18 AM, Eric McDonald wrote:
>>>> Hi Jens-Konrad,
>>>>
>>>> Sorry for the delayed response. (I was on vacation yesterday
>>>> and hoping that someone more familiar with the partitioning
>>>> code would answer.)
>>>>
>>>> My understanding of the code is that decreasing the subset
>>>> size will increase the number of partitions but will not
>>>> change the overall graph coverage. Therefore, I would not
>>>> expect it to lower memory requirements. (The overhead from
>>>> additional partitions might raise them some, but I have not
>>>> analyzed the code deeply enough to say one way or another
>>>> about that.) As far as changing the number of threads goes,
>>>> each thread does seem to maintain a local list of traversed
>>>> k-mers (hidden in the C++ implementation) but I do not yet
>>>> know how much that would impact memory usage. Have you tried
>>>> using a fewer number of threads?
>>>>
>>>> But, rather than guessing about causation, let's try to get
>>>> some more diagnostic information. Does the script die
>>>> immediately? (How long does the PBS job execute before
>>>> failure?) Can you attach the output and error files for a
>>>> job, and also the job script? What does
>>>> qstat -f <job_id>
>>>> where <job_id> is the ID of your running job, tell you about
>>>> memory usage?
>>>>
>>>> Thanks,
>>>> Eric
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Apr 8, 2013 at 3:34 AM, Jens-Konrad Preem
>>>> <jpreem at ut.ee <mailto:jpreem at ut.ee>> wrote:
>>>>
>>>> Hi,
>>>> I am having trouble with completing a partition-graph.py
>>>> job.
>>>> No matter the configurations It seems to terminate with
>>>> error messages hinting at low memory etc. *
>>>> Does LOWering the subset size reduce the memory use,
>>>> what about LOWering the amount of parallel threads?
>>>> The graafik.ht <http://graafik.ht> is 5.2G large, I had
>>>> the script running as a PBS job with 240 GB RAM
>>>> allocated. (That's as much as I can get it, maybe I'll
>>>> have an opportunity in the next week to double it, but I
>>>> wouldn't count on it).
>>>> Is it expected for the script to require so much RAM, or
>>>> is there some bug or some misuse by my part. Would there
>>>> be any configuration to get past this?
>>>>
>>>> Jens-Konrad Preem, MSc., University of Tartu
>>>>
>>>>
>>>>
>>>> * the latest configuration after I thought on smaller
>>>> subset size
>>>> ./khmer/scripts/partition-graph.py --threads 24
>>>> --subset-size 1e4 graafik
>>>> terminated with
>>>> cannot allocate memory for thread-local data: ABORT
>>>>
>>>>
>>>> _______________________________________________
>>>> khmer mailing list
>>>> khmer at lists.idyll.org <mailto:khmer at lists.idyll.org>
>>>> http://lists.idyll.org/listinfo/khmer
>>>>
>>>>
>>>>
>>>>
>>>> -- Eric McDonald
>>>> HPC/Cloud Software Engineer
>>>> for the Institute for Cyber-Enabled Research (iCER)
>>>> and the Laboratory for Genomics, Evolution, and
>>>> Development (GED)
>>>> Michigan State University
>>>> P: 517-355-8733 <tel:517-355-8733>
>>>
>>> -- Jens-Konrad Preem, MSc, University of Tartu
>>>
>>>
>>> _______________________________________________
>>> khmer mailing list
>>> khmer at lists.idyll.org <mailto:khmer at lists.idyll.org>
>>> http://lists.idyll.org/listinfo/khmer
>>>
>>>
>>>
>>>
>>> -- Eric McDonald
>>> HPC/Cloud Software Engineer
>>> for the Institute for Cyber-Enabled Research (iCER)
>>> and the Laboratory for Genomics, Evolution, and Development (GED)
>>> Michigan State University
>>> P: 517-355-8733 <tel:517-355-8733>
>>>
>>>
>>>
>>> -- Eric McDonald
>>> HPC/Cloud Software Engineer
>>> for the Institute for Cyber-Enabled Research (iCER)
>>> and the Laboratory for Genomics, Evolution, and Development (GED)
>>> Michigan State University
>>> P: 517-355-8733 <tel:517-355-8733>
>>>
>>>
>>> _______________________________________________
>>> khmer mailing list
>>> khmer at lists.idyll.org <mailto:khmer at lists.idyll.org>
>>> http://lists.idyll.org/listinfo/khmer
>> OK.
>> I post a failed run complete with PBS script, error log., and
>> qstat-f snapshots at different times.
>> I find it weird that I managed to complete the test run on
>> iowa-corn50M which had a graph file even larger. Might the number
>> of used threads pump up the memory? I used the sample commands
>> from the web-page for corn. These used 4 threads at max.
>> Jens-Konrad Preem
>>
>> _______________________________________________
>> khmer mailing list
>> khmer at lists.idyll.org <mailto:khmer at lists.idyll.org>
>> http://lists.idyll.org/listinfo/khmer
>>
>>
>>
>>
>> --
>> Eric McDonald
>> HPC/Cloud Software Engineer
>> for the Institute for Cyber-Enabled Research (iCER)
>> and the Laboratory for Genomics, Evolution, and Development (GED)
>> Michigan State University
>> P: 517-355-8733
>
> --
> Jens-Konrad Preem, MSc, University of Tartu
>
--
C. Titus Brown, ctb at msu.edu
More information about the khmer
mailing list