<div dir="ltr"><div style>Jens-Konrad,</div><div><br></div>Thanks for providing this information.<div> 15:  <span style="color:rgb(0,0,0);white-space:pre-wrap">resources_used.mem = 52379536kb</span></div><div><font color="#000000"><span style="white-space:pre-wrap"> 30:  </span></font><span style="color:rgb(0,0,0);white-space:pre-wrap">resources_used.mem = 90676068kb</span></div>
<div><font color="#000000"><span style="white-space:pre-wrap"> 45:  </span></font><span style="color:rgb(0,0,0);white-space:pre-wrap">resources_used.mem = 122543188kb</span></div><div><font color="#000000"><span style="white-space:pre-wrap">Definitely some ballooning memory use there.<br>
</span></font><div><br></div><div style>One more thing you may wish to examine from the command line is:</div><div style>  qmgr -c &quot;l s&quot; | grep &#39;resources_&#39;</div><div style>This will tell you about any default resources (such as physical memory) that your PBS server is assigning to new jobs. That said, I do believe that your jobs are exhausting available memory.</div>
</div><div style>So, now the question is whether anything can be done about it. Unless someone with more experience with the partitioning code decides to speak up, I am going to have analyze your chosen parameters and the pieces of code in question to see if I can deduce anything. I might not be able to do this until Monday - I am too tired to do it tonight (here in US Eastern time) and have a busy weekend ahead of me. </div>
<div style><br></div><div style>I promise I will get back to you with some better answers if no one else decides to say anything. While you are waiting for a response and if you want to test your hypothesis about the number of threads correlating to increased memory use, then I would recommend using a smaller data set and seeing what kind of scaling in the memory use you see as you change the number of threads.</div>
<div style><br></div><div style>Have a good weekend,</div><div style>  Eric</div><div style><br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, Apr 12, 2013 at 7:30 AM, Jens-Konrad Preem <span dir="ltr">&lt;<a href="mailto:jpreem@ut.ee" target="_blank">jpreem@ut.ee</a>&gt;</span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
  
    
  
  <div bgcolor="#FFFFFF" text="#000000"><div class="im">
    <div>On 04/11/2013 02:58 AM, Eric McDonald
      wrote:<br>
    </div>
    </div><div><div class="h5"><blockquote type="cite">
      <div dir="ltr">Forgot to reply to all, in case the answer will
        help anyone else on the list....<br>
        <br>
        <div class="gmail_quote">---------- Forwarded message ----------<br>
          From: <b class="gmail_sendername">Eric McDonald</b> <span dir="ltr">&lt;<a href="mailto:emcd.msu@gmail.com" target="_blank">emcd.msu@gmail.com</a>&gt;</span><br>
          Date: Wed, Apr 10, 2013 at 7:57 PM<br>
          Subject: Re: [khmer] parition-graph memory requirements<br>
          To: Jens-Konrad Preem &lt;<a href="mailto:jpreem@ut.ee" target="_blank">jpreem@ut.ee</a>&gt;<br>
          <br>
          <br>
          <div dir="ltr">Hi,
            <div><br>
            </div>
            <div>
              Sorry for the delayed reply.</div>
            <div><br>
            </div>
            <div>Thanks for sharing your job scripts. I notice that you
              are specifying the &#39;vmem&#39; resource. However, if PBS is
              also enforcing a limit on the &#39;mem&#39; resource (physical
              memory), then you may be encountering that limit. Do you
              know what default value is assigned by your site&#39;s PBS
              server for the &#39;mem&#39; resource?</div>
            <div><br>
            </div>
            <div>Again, if you run:</div>
            <div>  qstat -f &lt;job_id&gt;</div>
            <div>you should be able to determine both the resources
              allocated for the job and how much the job is actually
              using. Please let us know the results of this command, if
              you would like help interpreting them and figuring out how
              to change your PBS resource request, if necessary.</div>
            <div><br>
            </div>
            <div>As a side note, smaller k-mer lengths mean that more
              k-mers are being extracted from each sequence. This means
              that the hash tables are being more densely populated.
              And, that means that you are more likely to need larger
              hash tables to avoid a significant false positive rate.
              But, I think a better thing to say is that the amount of
              memory used by the hash tables is independent of k-mer
              size. So, changing k-mer length does not affect memory
              usage for many parts of khmer. (I would have to look more
              closely to see how this affects the partitioning code.)</div>
            <div><br>
            </div>
            <div>Hope that helps,</div>
            <div>  Eric</div>
            <div><br>
            </div>
          </div>
          <div>
            <div>
              <div class="gmail_extra"><br>
                <br>
                <div class="gmail_quote">On Wed, Apr 10, 2013 at 4:23
                  AM, Jens-Konrad Preem <span dir="ltr">&lt;<a href="mailto:jpreem@ut.ee" target="_blank">jpreem@ut.ee</a>&gt;</span> wrote:<br>
                  <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                    <div bgcolor="#FFFFFF" text="#000000"> Hi,<br>
                      <br>
                      In an extreme act of foolishness I do seem to have
                      lost my error logs. (I have been messing with the
                      different  scripts  here a lot and so got rid of
                      some of the outputs,  in some ill thought out
                      &quot;housekeeping&quot; event).<br>
                      <br>
                      I do attach here a bunch of PBS scripts that I
                      used to get as far as I am. I did use a different
                      script for most of the normalize and partition
                      pipeline, so I&#39;d have time to look at the outputs
                      and get a sense of time taken for each. The
                      scripts are in following order -
                      supkhme(normalize), suprem(filter-below),
                      supload(load-graph), and finally
                      supart(partition-graph). (As can be seen I try to
                      do the meta-genome analysis as per the guide.txt)<br>
                      All the previous scripts completed without
                      complaint, producing the 5.2 Gb &quot;graafik&quot; graph.<br>
                      <br>
                      The partition graph had failed a few times after
                      running an hour or so always with error messages
                      concerning memory. Now the latest script there
                      demands 240 Gb of memory which is maximum I can
                      demand in the near future, and still failed with
                      an error message concerning memory.<br>
                      <br>
                      I am right now working on reproducing the error,
                      so I can then supply you with .logs and .error
                      files, when no error occurs the better for me of
                      course.<br>
                      I decided to try different k-values this time as
                      suggested by <a href="https://khmer.readthedocs.org/en/latest/guide.html" target="_blank">https://khmer.readthedocs.org/en/latest/guide.html</a>
                      (20 for normalization, and 32 for partitioning)
                      those should make the graph file all the bigger -
                      I used the smaller ones to avoid running out of
                      memory but as it doesn&#39;t seem to help then what
                      the heck. ;D. Right now I am at the load-graph
                      stage with the new set. As it will complete in few
                      hours I&#39;ll put the partition-graph on the run and
                      then we will see if it dies within an hour. If so
                      I&#39;ll post a new set of scripts and logs.<br>
                      <br>
                      Thank you for your time,<br>
                      Jens-Konrad
                      <div>
                        <div><br>
                          <br>
                          <br>
                          <br>
                          <div>On 04/10/2013 04:18 AM, Eric McDonald
                            wrote:<br>
                          </div>
                          <blockquote type="cite">
                            <div dir="ltr">Hi Jens-Konrad,
                              <div><br>
                              </div>
                              <div>Sorry for the delayed response. (I
                                was on vacation yesterday and hoping
                                that someone more familiar with the
                                partitioning code would answer.)</div>
                              <div><br>
                              </div>
                              <div>My understanding of the code is that
                                decreasing the subset size will increase
                                the number of partitions but will not
                                change the overall graph coverage.
                                Therefore, I would not expect it to
                                lower memory requirements. (The overhead
                                from additional partitions might raise
                                them some, but I have not analyzed the
                                code deeply enough to say one way or
                                another about that.) As far as changing
                                the number of threads goes, each thread
                                does seem to maintain a local list of
                                traversed k-mers (hidden in the C++
                                implementation) but I do not yet know
                                how much that would impact memory usage.
                                Have you tried using a fewer number of
                                threads?</div>
                              <div><br>
                              </div>
                              <div>But, rather than guessing about
                                causation, let&#39;s try to get some more
                                diagnostic information. Does the script
                                die immediately? (How long does the PBS
                                job execute before failure?) Can you
                                attach the output and error files for a
                                job, and also the job script? What does</div>
                              <div>  qstat -f &lt;job_id&gt;</div>
                              <div>where &lt;job_id&gt; is the ID of
                                your running job, tell you about memory
                                usage?</div>
                              <div><br>
                              </div>
                              <div>Thanks,</div>
                              <div>  Eric</div>
                              <div><br>
                              </div>
                              <div><br>
                              </div>
                            </div>
                            <div class="gmail_extra"><br>
                              <br>
                              <div class="gmail_quote">On Mon, Apr 8,
                                2013 at 3:34 AM, Jens-Konrad Preem <span dir="ltr">&lt;<a href="mailto:jpreem@ut.ee" target="_blank">jpreem@ut.ee</a>&gt;</span>
                                wrote:<br>
                                <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>
                                  I am having trouble with completing a
                                  partition-graph.py job.<br>
                                  No matter the configurations It seems
                                  to terminate with error messages
                                  hinting at low memory etc. *<br>
                                  Does LOWering the subset size reduce
                                  the memory use, what about LOWering
                                  the amount of parallel threads?<br>
                                  The <a href="http://graafik.ht" target="_blank">graafik.ht</a> is
                                  5.2G large, I had the script running
                                  as a PBS job with 240 GB RAM
                                  allocated. (That&#39;s as much as I can
                                  get it, maybe I&#39;ll have an opportunity
                                  in the next week to double it, but I
                                  wouldn&#39;t count on it).<br>
                                  Is it expected for the script to
                                  require so much RAM, or is there some
                                  bug or some misuse by my part. Would
                                  there be any configuration to get past
                                  this?<br>
                                  <br>
                                  Jens-Konrad Preem, MSc., University of
                                  Tartu<br>
                                  <br>
                                  <br>
                                  <br>
                                  * the latest configuration after I
                                  thought on smaller subset size<br>
                                  ./khmer/scripts/partition-graph.py
                                   --threads 24 --subset-size 1e4
                                  graafik<br>
                                  terminated with<br>
                                  cannot allocate memory for
                                  thread-local data: ABORT<br>
                                  <br>
                                  <br>
_______________________________________________<br>
                                  khmer mailing list<br>
                                  <a href="mailto:khmer@lists.idyll.org" target="_blank">khmer@lists.idyll.org</a><br>
                                  <a href="http://lists.idyll.org/listinfo/khmer" target="_blank">http://lists.idyll.org/listinfo/khmer</a><br>
                                </blockquote>
                              </div>
                              <br>
                              <br clear="all">
                              <div><br>
                              </div>
                              -- <br>
                              <div dir="ltr">
                                <div>Eric McDonald</div>
                                <div>HPC/Cloud Software Engineer</div>
                                <div>  for the Institute for
                                  Cyber-Enabled Research (iCER)</div>
                                <div>  and the Laboratory for Genomics,
                                  Evolution, and Development (GED)</div>
                                <div>Michigan State University</div>
                                <div>P: <a href="tel:517-355-8733" value="+15173558733" target="_blank">517-355-8733</a></div>
                              </div>
                            </div>
                          </blockquote>
                          <br>
                        </div>
                      </div>
                      <span><font color="#888888">
                          <pre cols="72">-- 
Jens-Konrad Preem, MSc, University of Tartu</pre>
                        </font></span></div>
                    <br>
                    _______________________________________________<br>
                    khmer mailing list<br>
                    <a href="mailto:khmer@lists.idyll.org" target="_blank">khmer@lists.idyll.org</a><br>
                    <a href="http://lists.idyll.org/listinfo/khmer" target="_blank">http://lists.idyll.org/listinfo/khmer</a><br>
                    <br>
                  </blockquote>
                </div>
                <br>
                <br clear="all">
                <div><br>
                </div>
                -- <br>
                <div dir="ltr">
                  <div>Eric McDonald</div>
                  <div>HPC/Cloud Software Engineer</div>
                  <div>  for the Institute for Cyber-Enabled Research
                    (iCER)</div>
                  <div>  and the Laboratory for Genomics, Evolution, and
                    Development (GED)</div>
                  <div>Michigan State University</div>
                  <div>P: <a href="tel:517-355-8733" value="+15173558733" target="_blank">517-355-8733</a></div>
                </div>
              </div>
            </div>
          </div>
        </div>
        <br>
        <br clear="all">
        <div><br>
        </div>
        -- <br>
        <div dir="ltr">
          <div>Eric McDonald</div>
          <div>HPC/Cloud Software Engineer</div>
          <div>  for the Institute for Cyber-Enabled Research (iCER)</div>
          <div>  and the Laboratory for Genomics, Evolution, and
            Development (GED)</div>
          <div>Michigan State University</div>
          <div>P: <a href="tel:517-355-8733" value="+15173558733" target="_blank">517-355-8733</a></div>
        </div>
      </div>
      <br>
      <fieldset></fieldset>
      <br>
      <pre>_______________________________________________
khmer mailing list
<a href="mailto:khmer@lists.idyll.org" target="_blank">khmer@lists.idyll.org</a>
<a href="http://lists.idyll.org/listinfo/khmer" target="_blank">http://lists.idyll.org/listinfo/khmer</a>
</pre>
    </blockquote></div></div>
    OK.<br>
    I post a failed run complete with PBS script, error log., and
    qstat-f snapshots at different times.<br>
    I find it weird that I managed to complete the test run on
    iowa-corn50M which had a graph file even larger. Might the number of
    used threads pump up the memory? I used the sample commands from the
    web-page for corn. These used 4 threads at max. <br><span class="HOEnZb"><font color="#888888">
    Jens-Konrad Preem<br>
  </font></span></div>

<br>_______________________________________________<br>
khmer mailing list<br>
<a href="mailto:khmer@lists.idyll.org">khmer@lists.idyll.org</a><br>
<a href="http://lists.idyll.org/listinfo/khmer" target="_blank">http://lists.idyll.org/listinfo/khmer</a><br>
<br></blockquote></div><br><br clear="all"><div><br></div>-- <br><div dir="ltr"><div>Eric McDonald</div><div>HPC/Cloud Software Engineer</div><div>  for the Institute for Cyber-Enabled Research (iCER)</div><div>  and the Laboratory for Genomics, Evolution, and Development (GED)</div>
<div>Michigan State University</div><div>P: 517-355-8733</div></div>
</div>