<div dir="ltr"><div style>Jens-Konrad,</div><div><br></div>Thanks for providing this information.<div> 15: <span style="color:rgb(0,0,0);white-space:pre-wrap">resources_used.mem = 52379536kb</span></div><div><font color="#000000"><span style="white-space:pre-wrap"> 30: </span></font><span style="color:rgb(0,0,0);white-space:pre-wrap">resources_used.mem = 90676068kb</span></div>
<div><font color="#000000"><span style="white-space:pre-wrap"> 45: </span></font><span style="color:rgb(0,0,0);white-space:pre-wrap">resources_used.mem = 122543188kb</span></div><div><font color="#000000"><span style="white-space:pre-wrap">Definitely some ballooning memory use there.<br>
</span></font><div><br></div><div style>One more thing you may wish to examine from the command line is:</div><div style> qmgr -c "l s" | grep 'resources_'</div><div style>This will tell you about any default resources (such as physical memory) that your PBS server is assigning to new jobs. That said, I do believe that your jobs are exhausting available memory.</div>
</div><div style>So, now the question is whether anything can be done about it. Unless someone with more experience with the partitioning code decides to speak up, I am going to have analyze your chosen parameters and the pieces of code in question to see if I can deduce anything. I might not be able to do this until Monday - I am too tired to do it tonight (here in US Eastern time) and have a busy weekend ahead of me. </div>
<div style><br></div><div style>I promise I will get back to you with some better answers if no one else decides to say anything. While you are waiting for a response and if you want to test your hypothesis about the number of threads correlating to increased memory use, then I would recommend using a smaller data set and seeing what kind of scaling in the memory use you see as you change the number of threads.</div>
<div style><br></div><div style>Have a good weekend,</div><div style> Eric</div><div style><br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, Apr 12, 2013 at 7:30 AM, Jens-Konrad Preem <span dir="ltr"><<a href="mailto:jpreem@ut.ee" target="_blank">jpreem@ut.ee</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"><div class="im">
<div>On 04/11/2013 02:58 AM, Eric McDonald
wrote:<br>
</div>
</div><div><div class="h5"><blockquote type="cite">
<div dir="ltr">Forgot to reply to all, in case the answer will
help anyone else on the list....<br>
<br>
<div class="gmail_quote">---------- Forwarded message ----------<br>
From: <b class="gmail_sendername">Eric McDonald</b> <span dir="ltr"><<a href="mailto:emcd.msu@gmail.com" target="_blank">emcd.msu@gmail.com</a>></span><br>
Date: Wed, Apr 10, 2013 at 7:57 PM<br>
Subject: Re: [khmer] parition-graph memory requirements<br>
To: Jens-Konrad Preem <<a href="mailto:jpreem@ut.ee" target="_blank">jpreem@ut.ee</a>><br>
<br>
<br>
<div dir="ltr">Hi,
<div><br>
</div>
<div>
Sorry for the delayed reply.</div>
<div><br>
</div>
<div>Thanks for sharing your job scripts. I notice that you
are specifying the 'vmem' resource. However, if PBS is
also enforcing a limit on the 'mem' resource (physical
memory), then you may be encountering that limit. Do you
know what default value is assigned by your site's PBS
server for the 'mem' resource?</div>
<div><br>
</div>
<div>Again, if you run:</div>
<div> qstat -f <job_id></div>
<div>you should be able to determine both the resources
allocated for the job and how much the job is actually
using. Please let us know the results of this command, if
you would like help interpreting them and figuring out how
to change your PBS resource request, if necessary.</div>
<div><br>
</div>
<div>As a side note, smaller k-mer lengths mean that more
k-mers are being extracted from each sequence. This means
that the hash tables are being more densely populated.
And, that means that you are more likely to need larger
hash tables to avoid a significant false positive rate.
But, I think a better thing to say is that the amount of
memory used by the hash tables is independent of k-mer
size. So, changing k-mer length does not affect memory
usage for many parts of khmer. (I would have to look more
closely to see how this affects the partitioning code.)</div>
<div><br>
</div>
<div>Hope that helps,</div>
<div> Eric</div>
<div><br>
</div>
</div>
<div>
<div>
<div class="gmail_extra"><br>
<br>
<div class="gmail_quote">On Wed, Apr 10, 2013 at 4:23
AM, Jens-Konrad Preem <span dir="ltr"><<a href="mailto:jpreem@ut.ee" target="_blank">jpreem@ut.ee</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"> Hi,<br>
<br>
In an extreme act of foolishness I do seem to have
lost my error logs. (I have been messing with the
different scripts here a lot and so got rid of
some of the outputs, in some ill thought out
"housekeeping" event).<br>
<br>
I do attach here a bunch of PBS scripts that I
used to get as far as I am. I did use a different
script for most of the normalize and partition
pipeline, so I'd have time to look at the outputs
and get a sense of time taken for each. The
scripts are in following order -
supkhme(normalize), suprem(filter-below),
supload(load-graph), and finally
supart(partition-graph). (As can be seen I try to
do the meta-genome analysis as per the guide.txt)<br>
All the previous scripts completed without
complaint, producing the 5.2 Gb "graafik" graph.<br>
<br>
The partition graph had failed a few times after
running an hour or so always with error messages
concerning memory. Now the latest script there
demands 240 Gb of memory which is maximum I can
demand in the near future, and still failed with
an error message concerning memory.<br>
<br>
I am right now working on reproducing the error,
so I can then supply you with .logs and .error
files, when no error occurs the better for me of
course.<br>
I decided to try different k-values this time as
suggested by <a href="https://khmer.readthedocs.org/en/latest/guide.html" target="_blank">https://khmer.readthedocs.org/en/latest/guide.html</a>
(20 for normalization, and 32 for partitioning)
those should make the graph file all the bigger -
I used the smaller ones to avoid running out of
memory but as it doesn't seem to help then what
the heck. ;D. Right now I am at the load-graph
stage with the new set. As it will complete in few
hours I'll put the partition-graph on the run and
then we will see if it dies within an hour. If so
I'll post a new set of scripts and logs.<br>
<br>
Thank you for your time,<br>
Jens-Konrad
<div>
<div><br>
<br>
<br>
<br>
<div>On 04/10/2013 04:18 AM, Eric McDonald
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">Hi Jens-Konrad,
<div><br>
</div>
<div>Sorry for the delayed response. (I
was on vacation yesterday and hoping
that someone more familiar with the
partitioning code would answer.)</div>
<div><br>
</div>
<div>My understanding of the code is that
decreasing the subset size will increase
the number of partitions but will not
change the overall graph coverage.
Therefore, I would not expect it to
lower memory requirements. (The overhead
from additional partitions might raise
them some, but I have not analyzed the
code deeply enough to say one way or
another about that.) As far as changing
the number of threads goes, each thread
does seem to maintain a local list of
traversed k-mers (hidden in the C++
implementation) but I do not yet know
how much that would impact memory usage.
Have you tried using a fewer number of
threads?</div>
<div><br>
</div>
<div>But, rather than guessing about
causation, let's try to get some more
diagnostic information. Does the script
die immediately? (How long does the PBS
job execute before failure?) Can you
attach the output and error files for a
job, and also the job script? What does</div>
<div> qstat -f <job_id></div>
<div>where <job_id> is the ID of
your running job, tell you about memory
usage?</div>
<div><br>
</div>
<div>Thanks,</div>
<div> Eric</div>
<div><br>
</div>
<div><br>
</div>
</div>
<div class="gmail_extra"><br>
<br>
<div class="gmail_quote">On Mon, Apr 8,
2013 at 3:34 AM, Jens-Konrad Preem <span dir="ltr"><<a href="mailto:jpreem@ut.ee" target="_blank">jpreem@ut.ee</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>
I am having trouble with completing a
partition-graph.py job.<br>
No matter the configurations It seems
to terminate with error messages
hinting at low memory etc. *<br>
Does LOWering the subset size reduce
the memory use, what about LOWering
the amount of parallel threads?<br>
The <a href="http://graafik.ht" target="_blank">graafik.ht</a> is
5.2G large, I had the script running
as a PBS job with 240 GB RAM
allocated. (That's as much as I can
get it, maybe I'll have an opportunity
in the next week to double it, but I
wouldn't count on it).<br>
Is it expected for the script to
require so much RAM, or is there some
bug or some misuse by my part. Would
there be any configuration to get past
this?<br>
<br>
Jens-Konrad Preem, MSc., University of
Tartu<br>
<br>
<br>
<br>
* the latest configuration after I
thought on smaller subset size<br>
./khmer/scripts/partition-graph.py
--threads 24 --subset-size 1e4
graafik<br>
terminated with<br>
cannot allocate memory for
thread-local data: ABORT<br>
<br>
<br>
_______________________________________________<br>
khmer mailing list<br>
<a href="mailto:khmer@lists.idyll.org" target="_blank">khmer@lists.idyll.org</a><br>
<a href="http://lists.idyll.org/listinfo/khmer" target="_blank">http://lists.idyll.org/listinfo/khmer</a><br>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
-- <br>
<div dir="ltr">
<div>Eric McDonald</div>
<div>HPC/Cloud Software Engineer</div>
<div> for the Institute for
Cyber-Enabled Research (iCER)</div>
<div> and the Laboratory for Genomics,
Evolution, and Development (GED)</div>
<div>Michigan State University</div>
<div>P: <a href="tel:517-355-8733" value="+15173558733" target="_blank">517-355-8733</a></div>
</div>
</div>
</blockquote>
<br>
</div>
</div>
<span><font color="#888888">
<pre cols="72">--
Jens-Konrad Preem, MSc, University of Tartu</pre>
</font></span></div>
<br>
_______________________________________________<br>
khmer mailing list<br>
<a href="mailto:khmer@lists.idyll.org" target="_blank">khmer@lists.idyll.org</a><br>
<a href="http://lists.idyll.org/listinfo/khmer" target="_blank">http://lists.idyll.org/listinfo/khmer</a><br>
<br>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
-- <br>
<div dir="ltr">
<div>Eric McDonald</div>
<div>HPC/Cloud Software Engineer</div>
<div> for the Institute for Cyber-Enabled Research
(iCER)</div>
<div> and the Laboratory for Genomics, Evolution, and
Development (GED)</div>
<div>Michigan State University</div>
<div>P: <a href="tel:517-355-8733" value="+15173558733" target="_blank">517-355-8733</a></div>
</div>
</div>
</div>
</div>
</div>
<br>
<br clear="all">
<div><br>
</div>
-- <br>
<div dir="ltr">
<div>Eric McDonald</div>
<div>HPC/Cloud Software Engineer</div>
<div> for the Institute for Cyber-Enabled Research (iCER)</div>
<div> and the Laboratory for Genomics, Evolution, and
Development (GED)</div>
<div>Michigan State University</div>
<div>P: <a href="tel:517-355-8733" value="+15173558733" target="_blank">517-355-8733</a></div>
</div>
</div>
<br>
<fieldset></fieldset>
<br>
<pre>_______________________________________________
khmer mailing list
<a href="mailto:khmer@lists.idyll.org" target="_blank">khmer@lists.idyll.org</a>
<a href="http://lists.idyll.org/listinfo/khmer" target="_blank">http://lists.idyll.org/listinfo/khmer</a>
</pre>
</blockquote></div></div>
OK.<br>
I post a failed run complete with PBS script, error log., and
qstat-f snapshots at different times.<br>
I find it weird that I managed to complete the test run on
iowa-corn50M which had a graph file even larger. Might the number of
used threads pump up the memory? I used the sample commands from the
web-page for corn. These used 4 threads at max. <br><span class="HOEnZb"><font color="#888888">
Jens-Konrad Preem<br>
</font></span></div>
<br>_______________________________________________<br>
khmer mailing list<br>
<a href="mailto:khmer@lists.idyll.org">khmer@lists.idyll.org</a><br>
<a href="http://lists.idyll.org/listinfo/khmer" target="_blank">http://lists.idyll.org/listinfo/khmer</a><br>
<br></blockquote></div><br><br clear="all"><div><br></div>-- <br><div dir="ltr"><div>Eric McDonald</div><div>HPC/Cloud Software Engineer</div><div> for the Institute for Cyber-Enabled Research (iCER)</div><div> and the Laboratory for Genomics, Evolution, and Development (GED)</div>
<div>Michigan State University</div><div>P: 517-355-8733</div></div>
</div>