<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body dir="auto"><div>This long wait is probably a sign that you have a highly connected graph. We usually attribute that to the presence of sequencing artifacts, which have to be removed either via filter-below-abund or find-knot; do-partition can't do it itself. &nbsp;Take a look at the handbook or the info on part large data.</div><div><br></div><div>In your case I think your data may be small enough to assemble just after diginorm.<br><br><div>---</div>C. Titus Brown, <a href="mailto:ctb@msu.edu">ctb@msu.edu</a></div><div><br>On Mar 21, 2013, at 8:50, Eric McDonald &lt;<a href="mailto:emcd.msu@gmail.com">emcd.msu@gmail.com</a>&gt; wrote:<br><br></div><blockquote type="cite"><div><div dir="ltr">Thanks for the information, Alexis. If you are using 20 threads, then 441 / 20 is about 22 hours of elapsed time. So, it appears that all of the threads are working. (There is the possibility that they could be busy-waiting somewhere, but I didn't see any explicit opportunities for that from reading the 'do-partition.py' code.) Since you haven't seen .pmap files yet and since multithreaded execution is occurring, I expect that execution is currently at the following place in the script:<div>

&nbsp;&nbsp;<a href="https://github.com/ged-lab/khmer/blob/bleeding-edge/scripts/do-partition.py#L57">https://github.com/ged-lab/khmer/blob/bleeding-edge/scripts/do-partition.py#L57</a></div><div><br></div><div style="">I am not familiar with the 'do_subset_partition' code, but will try to analyze it later today. However, I would also listen to what Adina is saying - this step may just take a long time....</div>

<div style=""><br></div><div style="">Eric</div><div style=""><br></div><div style="">P.S. If you want to check on the output from the script, you could look in /var/spool/PBS/mom_priv (or equivalent) on the node where the job is running to see what the spooled output looks like thus far. (There should be a file named with the job ID and either a ".ER" or ".OU" extension, if I recall correctly, though it has been awhile since I have administered your kind of batch system.) You may need David to do this as the permissions to the directory are typically restrictive.</div>

<div><br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, Mar 21, 2013 at 5:40 AM, Alexis Groppi <span dir="ltr">&lt;<a href="mailto:alexis.groppi@u-bordeaux2.fr" target="_blank">alexis.groppi@u-bordeaux2.fr</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  
  <div text="#000000" bgcolor="#FFFFFF">

    A precision : <br>

    <br>

    The file submitted to the script do-partition.py contains 2576771

    reads (file.below)<br>

    The job was launched with the following options : <br>

    khmer-BETA/scripts/do-partition.py -k 20 -x 1e9 -T 20 file.graphbase

    file.below<br>

    <br>

    Alexis<br>

    <br>

    <br>

    <div>Le 21/03/2013 10:13, Alexis Groppi a

      écrit&nbsp;:<br>

    </div><div><div class="h5">

    <blockquote type="cite">

      
      Hi Eric,<br>

      <br>

      The script&nbsp; do-partition.py is now running since 22 hours.<br>

      Only the <a href="http://file.info" target="_blank">file.info</a> has been generated. No .pmap file were created.<br>

      <br>

      qstat -f gives :<br>

      &nbsp;&nbsp;&nbsp; resources_used.cput = 441:04:21<br>

      &nbsp;&nbsp;&nbsp; resources_used.mem = 12764228kb<br>

      &nbsp;&nbsp;&nbsp; resources_used.vmem = 13926732kb<br>

      &nbsp;&nbsp;&nbsp; resources_used.walltime = 22:05:56<br>

      <br>

      The amount of RAM on the server is 256 Go and the swap space is

      also 256 Go<br>

      <br>

      Your opinion ?<br>

      <br>

      Thanks<br>

      <br>

      Alexis<br>

      <br>

      <div>Le 20/03/2013 16:43, Alexis Groppi a

        écrit&nbsp;:<br>

      </div>

      <blockquote type="cite">

        
        Hi Eric,<br>

        <br>

        Actually the previous job was terminated by the limit of the

        walltime.<br>

        I relaunched the script.<br>

        qstat -fr gives :&nbsp;&nbsp;&nbsp; <br>

        &nbsp;&nbsp;&nbsp; resources_used.cput = 93:23:08<br>

        &nbsp;&nbsp;&nbsp; resources_used.mem = 12341932kb<br>

        &nbsp;&nbsp;&nbsp; resources_used.vmem = 13271372kb<br>

        &nbsp;&nbsp;&nbsp; resources_used.walltime = 04:42:39<br>

        <br>

        At this moment only the <a href="http://file.info" target="_blank">file.info</a> has been generated.<br>

        <br>

        Let's wait and see ...<br>

        <br>

        Thanks again<br>

        <br>

        Alexis<br>

        <br>

        <br>

        <div>Le 19/03/2013 21:50, Eric McDonald

          a écrit&nbsp;:<br>

        </div>

        <blockquote type="cite">

          <div dir="ltr">Hi Alexis,

            <div><br>

            </div>

            <div>What does:</div>

            <div>&nbsp; qstat -f &lt;job-id&gt;</div>

            <div>where &lt;job-id&gt; is the ID of your job

              tell you for the following fields:</div>

            <div>&nbsp;&nbsp;resources_used.cput</div>

            <div>&nbsp;&nbsp;resources_used.vmem</div>

            <div><br>

            </div>

            <div>And how do those values compare to actual

              amount of elapsed time for the job, the amount of physical

              memory on the node, and the total memory (RAM + swap

              space) on the node?</div>

            <div>Just checking to make sure that everything is

              running as it should be and that your process is not

              heavily into swap or something like that.</div>

            <div><br>

            </div>

            <div>Thanks,</div>

            <div>&nbsp; Eric</div>

            <div><br>

            </div>

          </div>

          <div class="gmail_extra"><br>

            <br>

            <div class="gmail_quote">On Tue, Mar 19, 2013 at 11:23 AM,

              Alexis Groppi <span dir="ltr">&lt;<a href="mailto:alexis.groppi@u-bordeaux2.fr" target="_blank">alexis.groppi@u-bordeaux2.fr</a>&gt;</span>

              wrote:<br>

              <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                <div text="#000000" bgcolor="#FFFFFF"> Hi Adina,<br>

                  <br>

                  First of all thanks for your answer and your advices

                  :)<br>

                  The script extract-partitions.py works !<br>

                  For the do-partition.py on my second set, it runs

                  since 32 hours. Should it not have produced at least

                  one temporary .pmap file ?<br>

                  <br>

                  Thanks again<br>

                  <br>

                  Alexis<br>

                  <br>

                  <div>Le 19/03/2013 12:58, Adina Chuang Howe a écrit&nbsp;:<br>

                  </div>

                  <blockquote type="cite">

                    <div>

                      <div><br>

                        <br>

                        <div class="gmail_quote">

                          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Message: 1<br>

                            Date: Tue, 19 Mar 2013 10:41:45 +0100<br>

                            From: Alexis Groppi &lt;<a href="mailto:alexis.groppi@u-bordeaux2.fr" target="_blank">alexis.groppi@u-bordeaux2.fr</a>&gt;<br>

                            Subject: [khmer] Duration of do-partition.py

                            (very long !)<br>

                            To: <a href="mailto:khmer@lists.idyll.org" target="_blank">khmer@lists.idyll.org</a><br>

                            Message-ID: &lt;<a href="mailto:514832D9.7090207@u-bordeaux2.fr" target="_blank">514832D9.7090207@u-bordeaux2.fr</a>&gt;<br>

                            Content-Type: text/plain;

                            charset="iso-8859-1"; Format="flowed"<br>

                            <br>

                            Hi Titus,<br>

                            <br>

                            After digital normalization and

                            filter-below-abund, upon your advice I<br>

                            performed <a href="http://do.partition.py" target="_blank">do.partition.py</a> on 2

                            sets of data (approx 2.5 millions of<br>

                            reads (75 nt)) :<br>

                            <br>

                            /khmer-BETA/scripts/do-partition.py -k 20 -x

                            1e9<br>

/ag/khmer/Sample_174/174r1_prinseq_good_bFr8.fasta.keep.below.graphbase<br>

/ag/khmer/Sample_174/174r1_prinseq_good_bFr8.fasta.keep.below<br>

                            and<br>

                            /khmer-BETA/scripts/do-partition.py -k 20 -x

                            1e9<br>

/ag/khmer/Sample_174/174r2_prinseq_good_1lIQ.fasta.keep.below.graphbase<br>

/ag/khmer/Sample_174/174r2_prinseq_good_1lIQ.fasta.keep.below<br>

                            <br>

                            For the first one I got a<br>

                            <a href="http://174r1_prinseq_good_bFr8.fasta.keep.below.graphbase.info" target="_blank">174r1_prinseq_good_bFr8.fasta.keep.below.graphbase.info</a>

                            with the<br>

                            information : 33 subsets total<br>

                            Thereafter 33 files .pmap from 0.pmap to

                            32.pmap regurlarly were created<br>

                            and finally I got unique file<br>

                            174r1_prinseq_good_bFr8.fasta.keep.below.part


                            (all the .pmap files were<br>

                            deleted)<br>

                            This treatment lasted approx 56 hours.<br>

                            <br>

                            For the second set (174r2), do-partition.py

                            is started since 32 hours<br>

                            but I only got the<br>

                            <a href="http://174r2_prinseq_good_1lIQ.fasta.keep.below.graphbase.info" target="_blank">174r2_prinseq_good_1lIQ.fasta.keep.below.graphbase.info</a>

                            with the<br>

                            information : 35 subsets total<br>

                            And nothing more...<br>

                            <br>

                            Is this duration "normal" ?<br>

                          </blockquote>

                          <div><br>

                          </div>

                          <div>Yes, this is typical. &nbsp;The longest I've

                            had it run is 3 weeks for very large

                            (billions of reads). &nbsp;In general,

                            partitioning is the most time consuming of

                            all the steps. &nbsp;Once its finished, you'll

                            have much smaller files which can be

                            assembled very quickly. &nbsp;Since I run

                            assembly on multiple assembler and with

                            multiple K lengths, this gain is often

                            &nbsp;significant for me. &nbsp;</div>

                          <div><br>

                          </div>

                          <div>To get the actual partitioned files, you

                            can use the following script:</div>

                          <div><br>

                          </div>

                          <div><a href="https://github.com/ged-lab/khmer/blob/master/scripts/extract-partitions.py" target="_blank">https://github.com/ged-lab/khmer/blob/master/scripts/extract-partitions.py</a></div>

                          <div><br>

                          </div>

                          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> (The

                            parameters for the threads are by default (4

                            threads))<br>

                            33 subsets and only one file at the end ?<br>

                            Should I stop do-partition.py on the second

                            set and re run it with more<br>

                            threads ?<br>

                            <br>

                          </blockquote>

                          <div><br>

                          </div>

                          <div>I'd suggest letting it run.</div>

                          <div><br>

                          </div>

                          <div>Best,</div>

                          <div>Adina</div>

                        </div>

                        <br>

                        <fieldset></fieldset>

                        <br>

                      </div>

                    </div>

                    <pre>_______________________________________________

khmer mailing list

<a href="mailto:khmer@lists.idyll.org" target="_blank">khmer@lists.idyll.org</a>

<a href="http://lists.idyll.org/listinfo/khmer" target="_blank">http://lists.idyll.org/listinfo/khmer</a><span><font color="#888888">

</font></span></pre>

                    <span><font color="#888888"> </font></span></blockquote>

                  <span><font color="#888888"> <br>

                      <div>-- <br>

                        &lt;mime-attachment.png&gt;</div>

                    </font></span></div>

                <br>

                _______________________________________________<br>

                khmer mailing list<br>

                <a href="mailto:khmer@lists.idyll.org" target="_blank">khmer@lists.idyll.org</a><br>

                <a href="http://lists.idyll.org/listinfo/khmer" target="_blank">http://lists.idyll.org/listinfo/khmer</a><br>

                <br>

              </blockquote>

            </div>

            <br>

            <br clear="all">

            <div><br>

            </div>

            -- <br>

            <div dir="ltr">

              <div>Eric McDonald</div>

              <div>HPC/Cloud Software Engineer</div>

              <div>&nbsp; for the Institute for Cyber-Enabled Research (iCER)</div>

              <div>&nbsp; and the Laboratory for Genomics, Evolution, and

                Development (GED)</div>

              <div>Michigan State University</div>

              <div>P: <a href="tel:517-355-8733" value="+15173558733" target="_blank">517-355-8733</a></div>

            </div>

          </div>

        </blockquote>

        <br>

        <div>-- <br>

          &lt;mime-attachment.png&gt;</div>

      </blockquote>

      <br>

      <div>-- <br>

        &lt;mime-attachment.png&gt;</div>

    </blockquote>

    <br>

    </div></div><span class="HOEnZb"><font color="#888888"><div>-- <br>

      &lt;Signature_Mail_A_Groppi.png&gt;</div>

  </font></span></div>


</blockquote></div><br><br clear="all"><div><br></div>-- <br><div dir="ltr"><div>Eric McDonald</div><div>HPC/Cloud Software Engineer</div><div>&nbsp; for the Institute for Cyber-Enabled Research (iCER)</div><div>&nbsp; and the Laboratory for Genomics, Evolution, and Development (GED)</div>

<div>Michigan State University</div><div>P: 517-355-8733</div></div>

</div>

</div></blockquote><blockquote type="cite"><div><span>_______________________________________________</span><br><span>khmer mailing list</span><br><span><a href="mailto:khmer@lists.idyll.org">khmer@lists.idyll.org</a></span><br><span><a href="http://lists.idyll.org/listinfo/khmer">http://lists.idyll.org/listinfo/khmer</a></span><br></div></blockquote></body></html>