<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body dir="auto"><div>This long wait is probably a sign that you have a highly connected graph. We usually attribute that to the presence of sequencing artifacts, which have to be removed either via filter-below-abund or find-knot; do-partition can't do it itself. &nbsp;Take a look at the handbook or the info on part large data.</div><div><br></div><div>In your case I think your data may be small enough to assemble just after diginorm.<br><br><div>---</div>C. Titus Brown, <a href="mailto:ctb@msu.edu">ctb@msu.edu</a></div><div><br>On Mar 21, 2013, at 8:50, Eric McDonald &lt;<a href="mailto:emcd.msu@gmail.com">emcd.msu@gmail.com</a>&gt; wrote:<br><br></div><blockquote type="cite"><div><div dir="ltr">Thanks for the information, Alexis. If you are using 20 threads, then 441 / 20 is about 22 hours of elapsed time. So, it appears that all of the threads are working. (There is the possibility that they could be busy-waiting somewhere, but I didn't see any explicit opportunities for that from reading the 'do-partition.py' code.) Since you haven't seen .pmap files yet and since multithreaded execution is occurring, I expect that execution is currently at the following place in the script:<div>
&nbsp;&nbsp;<a href="https://github.com/ged-lab/khmer/blob/bleeding-edge/scripts/do-partition.py#L57">https://github.com/ged-lab/khmer/blob/bleeding-edge/scripts/do-partition.py#L57</a></div><div><br></div><div style="">I am not familiar with the 'do_subset_partition' code, but will try to analyze it later today. However, I would also listen to what Adina is saying - this step may just take a long time....</div>
<div style=""><br></div><div style="">Eric</div><div style=""><br></div><div style="">P.S. If you want to check on the output from the script, you could look in /var/spool/PBS/mom_priv (or equivalent) on the node where the job is running to see what the spooled output looks like thus far. (There should be a file named with the job ID and either a ".ER" or ".OU" extension, if I recall correctly, though it has been awhile since I have administered your kind of batch system.) You may need David to do this as the permissions to the directory are typically restrictive.</div>
<div><br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, Mar 21, 2013 at 5:40 AM, Alexis Groppi <span dir="ltr">&lt;<a href="mailto:alexis.groppi@u-bordeaux2.fr" target="_blank">alexis.groppi@u-bordeaux2.fr</a>&gt;</span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
  
    
  
  <div text="#000000" bgcolor="#FFFFFF">
    A precision : <br>
    <br>
    The file submitted to the script do-partition.py contains 2576771
    reads (file.below)<br>
    The job was launched with the following options : <br>
    khmer-BETA/scripts/do-partition.py -k 20 -x 1e9 -T 20 file.graphbase
    file.below<br>
    <br>
    Alexis<br>
    <br>
    <br>
    <div>Le 21/03/2013 10:13, Alexis Groppi a
      écrit&nbsp;:<br>
    </div><div><div class="h5">
    <blockquote type="cite">
      
      Hi Eric,<br>
      <br>
      The script&nbsp; do-partition.py is now running since 22 hours.<br>
      Only the <a href="http://file.info" target="_blank">file.info</a> has been generated. No .pmap file were created.<br>
      <br>
      qstat -f gives :<br>
      &nbsp;&nbsp;&nbsp; resources_used.cput = 441:04:21<br>
      &nbsp;&nbsp;&nbsp; resources_used.mem = 12764228kb<br>
      &nbsp;&nbsp;&nbsp; resources_used.vmem = 13926732kb<br>
      &nbsp;&nbsp;&nbsp; resources_used.walltime = 22:05:56<br>
      <br>
      The amount of RAM on the server is 256 Go and the swap space is
      also 256 Go<br>
      <br>
      Your opinion ?<br>
      <br>
      Thanks<br>
      <br>
      Alexis<br>
      <br>
      <div>Le 20/03/2013 16:43, Alexis Groppi a
        écrit&nbsp;:<br>
      </div>
      <blockquote type="cite">
        
        Hi Eric,<br>
        <br>
        Actually the previous job was terminated by the limit of the
        walltime.<br>
        I relaunched the script.<br>
        qstat -fr gives :&nbsp;&nbsp;&nbsp; <br>
        &nbsp;&nbsp;&nbsp; resources_used.cput = 93:23:08<br>
        &nbsp;&nbsp;&nbsp; resources_used.mem = 12341932kb<br>
        &nbsp;&nbsp;&nbsp; resources_used.vmem = 13271372kb<br>
        &nbsp;&nbsp;&nbsp; resources_used.walltime = 04:42:39<br>
        <br>
        At this moment only the <a href="http://file.info" target="_blank">file.info</a> has been generated.<br>
        <br>
        Let's wait and see ...<br>
        <br>
        Thanks again<br>
        <br>
        Alexis<br>
        <br>
        <br>
        <div>Le 19/03/2013 21:50, Eric McDonald
          a écrit&nbsp;:<br>
        </div>
        <blockquote type="cite">
          <div dir="ltr">Hi Alexis,
            <div><br>
            </div>
            <div>What does:</div>
            <div>&nbsp; qstat -f &lt;job-id&gt;</div>
            <div>where &lt;job-id&gt; is the ID of your job
              tell you for the following fields:</div>
            <div>&nbsp;&nbsp;resources_used.cput</div>
            <div>&nbsp;&nbsp;resources_used.vmem</div>
            <div><br>
            </div>
            <div>And how do those values compare to actual
              amount of elapsed time for the job, the amount of physical
              memory on the node, and the total memory (RAM + swap
              space) on the node?</div>
            <div>Just checking to make sure that everything is
              running as it should be and that your process is not
              heavily into swap or something like that.</div>
            <div><br>
            </div>
            <div>Thanks,</div>
            <div>&nbsp; Eric</div>
            <div><br>
            </div>
          </div>
          <div class="gmail_extra"><br>
            <br>
            <div class="gmail_quote">On Tue, Mar 19, 2013 at 11:23 AM,
              Alexis Groppi <span dir="ltr">&lt;<a href="mailto:alexis.groppi@u-bordeaux2.fr" target="_blank">alexis.groppi@u-bordeaux2.fr</a>&gt;</span>
              wrote:<br>
              <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                <div text="#000000" bgcolor="#FFFFFF"> Hi Adina,<br>
                  <br>
                  First of all thanks for your answer and your advices
                  :)<br>
                  The script extract-partitions.py works !<br>
                  For the do-partition.py on my second set, it runs
                  since 32 hours. Should it not have produced at least
                  one temporary .pmap file ?<br>
                  <br>
                  Thanks again<br>
                  <br>
                  Alexis<br>
                  <br>
                  <div>Le 19/03/2013 12:58, Adina Chuang Howe a écrit&nbsp;:<br>
                  </div>
                  <blockquote type="cite">
                    <div>
                      <div><br>
                        <br>
                        <div class="gmail_quote">
                          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Message: 1<br>
                            Date: Tue, 19 Mar 2013 10:41:45 +0100<br>
                            From: Alexis Groppi &lt;<a href="mailto:alexis.groppi@u-bordeaux2.fr" target="_blank">alexis.groppi@u-bordeaux2.fr</a>&gt;<br>
                            Subject: [khmer] Duration of do-partition.py
                            (very long !)<br>
                            To: <a href="mailto:khmer@lists.idyll.org" target="_blank">khmer@lists.idyll.org</a><br>
                            Message-ID: &lt;<a href="mailto:514832D9.7090207@u-bordeaux2.fr" target="_blank">514832D9.7090207@u-bordeaux2.fr</a>&gt;<br>
                            Content-Type: text/plain;
                            charset="iso-8859-1"; Format="flowed"<br>
                            <br>
                            Hi Titus,<br>
                            <br>
                            After digital normalization and
                            filter-below-abund, upon your advice I<br>
                            performed <a href="http://do.partition.py" target="_blank">do.partition.py</a> on 2
                            sets of data (approx 2.5 millions of<br>
                            reads (75 nt)) :<br>
                            <br>
                            /khmer-BETA/scripts/do-partition.py -k 20 -x
                            1e9<br>
/ag/khmer/Sample_174/174r1_prinseq_good_bFr8.fasta.keep.below.graphbase<br>
/ag/khmer/Sample_174/174r1_prinseq_good_bFr8.fasta.keep.below<br>
                            and<br>
                            /khmer-BETA/scripts/do-partition.py -k 20 -x
                            1e9<br>
/ag/khmer/Sample_174/174r2_prinseq_good_1lIQ.fasta.keep.below.graphbase<br>
/ag/khmer/Sample_174/174r2_prinseq_good_1lIQ.fasta.keep.below<br>
                            <br>
                            For the first one I got a<br>
                            <a href="http://174r1_prinseq_good_bFr8.fasta.keep.below.graphbase.info" target="_blank">174r1_prinseq_good_bFr8.fasta.keep.below.graphbase.info</a>
                            with the<br>
                            information : 33 subsets total<br>
                            Thereafter 33 files .pmap from 0.pmap to
                            32.pmap regurlarly were created<br>
                            and finally I got unique file<br>
                            174r1_prinseq_good_bFr8.fasta.keep.below.part

                            (all the .pmap files were<br>
                            deleted)<br>
                            This treatment lasted approx 56 hours.<br>
                            <br>
                            For the second set (174r2), do-partition.py
                            is started since 32 hours<br>
                            but I only got the<br>
                            <a href="http://174r2_prinseq_good_1lIQ.fasta.keep.below.graphbase.info" target="_blank">174r2_prinseq_good_1lIQ.fasta.keep.below.graphbase.info</a>
                            with the<br>
                            information : 35 subsets total<br>
                            And nothing more...<br>
                            <br>
                            Is this duration "normal" ?<br>
                          </blockquote>
                          <div><br>
                          </div>
                          <div>Yes, this is typical. &nbsp;The longest I've
                            had it run is 3 weeks for very large
                            (billions of reads). &nbsp;In general,
                            partitioning is the most time consuming of
                            all the steps. &nbsp;Once its finished, you'll
                            have much smaller files which can be
                            assembled very quickly. &nbsp;Since I run
                            assembly on multiple assembler and with
                            multiple K lengths, this gain is often
                            &nbsp;significant for me. &nbsp;</div>
                          <div><br>
                          </div>
                          <div>To get the actual partitioned files, you
                            can use the following script:</div>
                          <div><br>
                          </div>
                          <div><a href="https://github.com/ged-lab/khmer/blob/master/scripts/extract-partitions.py" target="_blank">https://github.com/ged-lab/khmer/blob/master/scripts/extract-partitions.py</a></div>
                          <div><br>
                          </div>
                          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> (The
                            parameters for the threads are by default (4
                            threads))<br>
                            33 subsets and only one file at the end ?<br>
                            Should I stop do-partition.py on the second
                            set and re run it with more<br>
                            threads ?<br>
                            <br>
                          </blockquote>
                          <div><br>
                          </div>
                          <div>I'd suggest letting it run.</div>
                          <div><br>
                          </div>
                          <div>Best,</div>
                          <div>Adina</div>
                        </div>
                        <br>
                        <fieldset></fieldset>
                        <br>
                      </div>
                    </div>
                    <pre>_______________________________________________
khmer mailing list
<a href="mailto:khmer@lists.idyll.org" target="_blank">khmer@lists.idyll.org</a>
<a href="http://lists.idyll.org/listinfo/khmer" target="_blank">http://lists.idyll.org/listinfo/khmer</a><span><font color="#888888">
</font></span></pre>
                    <span><font color="#888888"> </font></span></blockquote>
                  <span><font color="#888888"> <br>
                      <div>-- <br>
                        &lt;mime-attachment.png&gt;</div>
                    </font></span></div>
                <br>
                _______________________________________________<br>
                khmer mailing list<br>
                <a href="mailto:khmer@lists.idyll.org" target="_blank">khmer@lists.idyll.org</a><br>
                <a href="http://lists.idyll.org/listinfo/khmer" target="_blank">http://lists.idyll.org/listinfo/khmer</a><br>
                <br>
              </blockquote>
            </div>
            <br>
            <br clear="all">
            <div><br>
            </div>
            -- <br>
            <div dir="ltr">
              <div>Eric McDonald</div>
              <div>HPC/Cloud Software Engineer</div>
              <div>&nbsp; for the Institute for Cyber-Enabled Research (iCER)</div>
              <div>&nbsp; and the Laboratory for Genomics, Evolution, and
                Development (GED)</div>
              <div>Michigan State University</div>
              <div>P: <a href="tel:517-355-8733" value="+15173558733" target="_blank">517-355-8733</a></div>
            </div>
          </div>
        </blockquote>
        <br>
        <div>-- <br>
          &lt;mime-attachment.png&gt;</div>
      </blockquote>
      <br>
      <div>-- <br>
        &lt;mime-attachment.png&gt;</div>
    </blockquote>
    <br>
    </div></div><span class="HOEnZb"><font color="#888888"><div>-- <br>
      &lt;Signature_Mail_A_Groppi.png&gt;</div>
  </font></span></div>

</blockquote></div><br><br clear="all"><div><br></div>-- <br><div dir="ltr"><div>Eric McDonald</div><div>HPC/Cloud Software Engineer</div><div>&nbsp; for the Institute for Cyber-Enabled Research (iCER)</div><div>&nbsp; and the Laboratory for Genomics, Evolution, and Development (GED)</div>
<div>Michigan State University</div><div>P: 517-355-8733</div></div>
</div>
</div></blockquote><blockquote type="cite"><div><span>_______________________________________________</span><br><span>khmer mailing list</span><br><span><a href="mailto:khmer@lists.idyll.org">khmer@lists.idyll.org</a></span><br><span><a href="http://lists.idyll.org/listinfo/khmer">http://lists.idyll.org/listinfo/khmer</a></span><br></div></blockquote></body></html>