<html>
  <head>
    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    Hi Titus,<br>
    <br>
    Thanks for your answer. The input file I use should not have this
    artefact because it comes after filter-below-abund treatment.<br>
    I will try with find-knots and then filter-stoptags.<br>
    For your last proposition : what is the size limit ?<br>
    Subsidiary question, Eric told me "Titus created a guide about what
    size hash table to generally use with certain kinds of data"<br>
    If possible I would be very interested to have this guide. <br>
    <br>
    Thanks again<br>
    <br>
    Alexis<br>
    <br>
    <div class="moz-cite-prefix">Le 21/03/2013 14:14, C. Titus Brown a
      écrit :<br>
    </div>
    <blockquote cite="mid:78DDEAC6-43ED-4825-B61B-D57ABE904A05@msu.edu"
      type="cite">
      <meta http-equiv="content-type" content="text/html; charset=UTF-8">
      <div>This long wait is probably a sign that you have a highly
        connected graph. We usually attribute that to the presence of
        sequencing artifacts, which have to be removed either via
        filter-below-abund or find-knot; do-partition can't do it
        itself.  Take a look at the handbook or the info on part large
        data.</div>
      <div><br>
      </div>
      <div>In your case I think your data may be small enough to
        assemble just after diginorm.<br>
        <br>
        <div>---</div>
        C. Titus Brown, <a moz-do-not-send="true"
          href="mailto:ctb@msu.edu">ctb@msu.edu</a></div>
      <div><br>
        On Mar 21, 2013, at 8:50, Eric McDonald &lt;<a
          moz-do-not-send="true" href="mailto:emcd.msu@gmail.com">emcd.msu@gmail.com</a>&gt;
        wrote:<br>
        <br>
      </div>
      <blockquote type="cite">
        <div>
          <div dir="ltr">Thanks for the information, Alexis. If you are
            using 20 threads, then 441 / 20 is about 22 hours of elapsed
            time. So, it appears that all of the threads are working.
            (There is the possibility that they could be busy-waiting
            somewhere, but I didn't see any explicit opportunities for
            that from reading the 'do-partition.py' code.) Since you
            haven't seen .pmap files yet and since multithreaded
            execution is occurring, I expect that execution is currently
            at the following place in the script:
            <div>
                <a moz-do-not-send="true"
href="https://github.com/ged-lab/khmer/blob/bleeding-edge/scripts/do-partition.py#L57">https://github.com/ged-lab/khmer/blob/bleeding-edge/scripts/do-partition.py#L57</a></div>
            <div><br>
            </div>
            <div style="">I am not familiar with the
              'do_subset_partition' code, but will try to analyze it
              later today. However, I would also listen to what Adina is
              saying - this step may just take a long time....</div>
            <div style=""><br>
            </div>
            <div style="">Eric</div>
            <div style=""><br>
            </div>
            <div style="">P.S. If you want to check on the output from
              the script, you could look in /var/spool/PBS/mom_priv (or
              equivalent) on the node where the job is running to see
              what the spooled output looks like thus far. (There should
              be a file named with the job ID and either a ".ER" or
              ".OU" extension, if I recall correctly, though it has been
              awhile since I have administered your kind of batch
              system.) You may need David to do this as the permissions
              to the directory are typically restrictive.</div>
            <div><br>
            </div>
          </div>
          <div class="gmail_extra"><br>
            <br>
            <div class="gmail_quote">On Thu, Mar 21, 2013 at 5:40 AM,
              Alexis Groppi <span dir="ltr">&lt;<a
                  moz-do-not-send="true"
                  href="mailto:alexis.groppi@u-bordeaux2.fr"
                  target="_blank">alexis.groppi@u-bordeaux2.fr</a>&gt;</span>
              wrote:<br>
              <blockquote class="gmail_quote" style="margin:0 0 0
                .8ex;border-left:1px #ccc solid;padding-left:1ex">
                <div text="#000000" bgcolor="#FFFFFF"> A precision : <br>
                  <br>
                  The file submitted to the script do-partition.py
                  contains 2576771 reads (file.below)<br>
                  The job was launched with the following options : <br>
                  khmer-BETA/scripts/do-partition.py -k 20 -x 1e9 -T 20
                  file.graphbase file.below<br>
                  <br>
                  Alexis<br>
                  <br>
                  <br>
                  <div>Le 21/03/2013 10:13, Alexis Groppi a écrit :<br>
                  </div>
                  <div>
                    <div class="h5">
                      <blockquote type="cite"> Hi Eric,<br>
                        <br>
                        The script  do-partition.py is now running since
                        22 hours.<br>
                        Only the <a moz-do-not-send="true"
                          href="http://file.info" target="_blank">file.info</a>
                        has been generated. No .pmap file were created.<br>
                        <br>
                        qstat -f gives :<br>
                            resources_used.cput = 441:04:21<br>
                            resources_used.mem = 12764228kb<br>
                            resources_used.vmem = 13926732kb<br>
                            resources_used.walltime = 22:05:56<br>
                        <br>
                        The amount of RAM on the server is 256 Go and
                        the swap space is also 256 Go<br>
                        <br>
                        Your opinion ?<br>
                        <br>
                        Thanks<br>
                        <br>
                        Alexis<br>
                        <br>
                        <div>Le 20/03/2013 16:43, Alexis Groppi a
                          écrit :<br>
                        </div>
                        <blockquote type="cite"> Hi Eric,<br>
                          <br>
                          Actually the previous job was terminated by
                          the limit of the walltime.<br>
                          I relaunched the script.<br>
                          qstat -fr gives :    <br>
                              resources_used.cput = 93:23:08<br>
                              resources_used.mem = 12341932kb<br>
                              resources_used.vmem = 13271372kb<br>
                              resources_used.walltime = 04:42:39<br>
                          <br>
                          At this moment only the <a
                            moz-do-not-send="true"
                            href="http://file.info" target="_blank">file.info</a>
                          has been generated.<br>
                          <br>
                          Let's wait and see ...<br>
                          <br>
                          Thanks again<br>
                          <br>
                          Alexis<br>
                          <br>
                          <br>
                          <div>Le 19/03/2013 21:50, Eric McDonald a
                            écrit :<br>
                          </div>
                          <blockquote type="cite">
                            <div dir="ltr">Hi Alexis,
                              <div><br>
                              </div>
                              <div>What does:</div>
                              <div>  qstat -f &lt;job-id&gt;</div>
                              <div>where &lt;job-id&gt; is the ID of
                                your job tell you for the following
                                fields:</div>
                              <div>  resources_used.cput</div>
                              <div>  resources_used.vmem</div>
                              <div><br>
                              </div>
                              <div>And how do those values compare to
                                actual amount of elapsed time for the
                                job, the amount of physical memory on
                                the node, and the total memory (RAM +
                                swap space) on the node?</div>
                              <div>Just checking to make sure that
                                everything is running as it should be
                                and that your process is not heavily
                                into swap or something like that.</div>
                              <div><br>
                              </div>
                              <div>Thanks,</div>
                              <div>  Eric</div>
                              <div><br>
                              </div>
                            </div>
                            <div class="gmail_extra"><br>
                              <br>
                              <div class="gmail_quote">On Tue, Mar 19,
                                2013 at 11:23 AM, Alexis Groppi <span
                                  dir="ltr">&lt;<a
                                    moz-do-not-send="true"
                                    href="mailto:alexis.groppi@u-bordeaux2.fr"
                                    target="_blank">alexis.groppi@u-bordeaux2.fr</a>&gt;</span>
                                wrote:<br>
                                <blockquote class="gmail_quote"
                                  style="margin:0 0 0
                                  .8ex;border-left:1px #ccc
                                  solid;padding-left:1ex">
                                  <div text="#000000" bgcolor="#FFFFFF">
                                    Hi Adina,<br>
                                    <br>
                                    First of all thanks for your answer
                                    and your advices :)<br>
                                    The script extract-partitions.py
                                    works !<br>
                                    For the do-partition.py on my second
                                    set, it runs since 32 hours. Should
                                    it not have produced at least one
                                    temporary .pmap file ?<br>
                                    <br>
                                    Thanks again<br>
                                    <br>
                                    Alexis<br>
                                    <br>
                                    <div>Le 19/03/2013 12:58, Adina
                                      Chuang Howe a écrit :<br>
                                    </div>
                                    <blockquote type="cite">
                                      <div>
                                        <div><br>
                                          <br>
                                          <div class="gmail_quote">
                                            <blockquote
                                              class="gmail_quote"
                                              style="margin:0 0 0
                                              .8ex;border-left:1px #ccc
                                              solid;padding-left:1ex">
                                              Message: 1<br>
                                              Date: Tue, 19 Mar 2013
                                              10:41:45 +0100<br>
                                              From: Alexis Groppi &lt;<a
                                                moz-do-not-send="true"
                                                href="mailto:alexis.groppi@u-bordeaux2.fr"
                                                target="_blank">alexis.groppi@u-bordeaux2.fr</a>&gt;<br>
                                              Subject: [khmer] Duration
                                              of do-partition.py (very
                                              long !)<br>
                                              To: <a
                                                moz-do-not-send="true"
                                                href="mailto:khmer@lists.idyll.org"
                                                target="_blank">khmer@lists.idyll.org</a><br>
                                              Message-ID: &lt;<a
                                                moz-do-not-send="true"
                                                href="mailto:514832D9.7090207@u-bordeaux2.fr"
                                                target="_blank">514832D9.7090207@u-bordeaux2.fr</a>&gt;<br>
                                              Content-Type: text/plain;
                                              charset="iso-8859-1";
                                              Format="flowed"<br>
                                              <br>
                                              Hi Titus,<br>
                                              <br>
                                              After digital
                                              normalization and
                                              filter-below-abund, upon
                                              your advice I<br>
                                              performed <a
                                                moz-do-not-send="true"
                                                href="http://do.partition.py"
                                                target="_blank">do.partition.py</a>
                                              on 2 sets of data (approx
                                              2.5 millions of<br>
                                              reads (75 nt)) :<br>
                                              <br>
                                              /khmer-BETA/scripts/do-partition.py
                                              -k 20 -x 1e9<br>
/ag/khmer/Sample_174/174r1_prinseq_good_bFr8.fasta.keep.below.graphbase<br>
/ag/khmer/Sample_174/174r1_prinseq_good_bFr8.fasta.keep.below<br>
                                              and<br>
                                              /khmer-BETA/scripts/do-partition.py
                                              -k 20 -x 1e9<br>
/ag/khmer/Sample_174/174r2_prinseq_good_1lIQ.fasta.keep.below.graphbase<br>
/ag/khmer/Sample_174/174r2_prinseq_good_1lIQ.fasta.keep.below<br>
                                              <br>
                                              For the first one I got a<br>
                                              <a moz-do-not-send="true"
href="http://174r1_prinseq_good_bFr8.fasta.keep.below.graphbase.info"
                                                target="_blank">174r1_prinseq_good_bFr8.fasta.keep.below.graphbase.info</a>
                                              with the<br>
                                              information : 33 subsets
                                              total<br>
                                              Thereafter 33 files .pmap
                                              from 0.pmap to 32.pmap
                                              regurlarly were created<br>
                                              and finally I got unique
                                              file<br>
                                              174r1_prinseq_good_bFr8.fasta.keep.below.part


                                              (all the .pmap files were<br>
                                              deleted)<br>
                                              This treatment lasted
                                              approx 56 hours.<br>
                                              <br>
                                              For the second set
                                              (174r2), do-partition.py
                                              is started since 32 hours<br>
                                              but I only got the<br>
                                              <a moz-do-not-send="true"
href="http://174r2_prinseq_good_1lIQ.fasta.keep.below.graphbase.info"
                                                target="_blank">174r2_prinseq_good_1lIQ.fasta.keep.below.graphbase.info</a>
                                              with the<br>
                                              information : 35 subsets
                                              total<br>
                                              And nothing more...<br>
                                              <br>
                                              Is this duration "normal"
                                              ?<br>
                                            </blockquote>
                                            <div><br>
                                            </div>
                                            <div>Yes, this is typical.
                                               The longest I've had it
                                              run is 3 weeks for very
                                              large (billions of reads).
                                               In general, partitioning
                                              is the most time consuming
                                              of all the steps.  Once
                                              its finished, you'll have
                                              much smaller files which
                                              can be assembled very
                                              quickly.  Since I run
                                              assembly on multiple
                                              assembler and with
                                              multiple K lengths, this
                                              gain is often  significant
                                              for me.  </div>
                                            <div><br>
                                            </div>
                                            <div>To get the actual
                                              partitioned files, you can
                                              use the following script:</div>
                                            <div><br>
                                            </div>
                                            <div><a
                                                moz-do-not-send="true"
href="https://github.com/ged-lab/khmer/blob/master/scripts/extract-partitions.py"
                                                target="_blank">https://github.com/ged-lab/khmer/blob/master/scripts/extract-partitions.py</a></div>
                                            <div><br>
                                            </div>
                                            <blockquote
                                              class="gmail_quote"
                                              style="margin:0 0 0
                                              .8ex;border-left:1px #ccc
                                              solid;padding-left:1ex">
                                              (The parameters for the
                                              threads are by default (4
                                              threads))<br>
                                              33 subsets and only one
                                              file at the end ?<br>
                                              Should I stop
                                              do-partition.py on the
                                              second set and re run it
                                              with more<br>
                                              threads ?<br>
                                              <br>
                                            </blockquote>
                                            <div><br>
                                            </div>
                                            <div>I'd suggest letting it
                                              run.</div>
                                            <div><br>
                                            </div>
                                            <div>Best,</div>
                                            <div>Adina</div>
                                          </div>
                                          <br>
                                          <fieldset></fieldset>
                                          <br>
                                        </div>
                                      </div>
                                      <pre>_______________________________________________
khmer mailing list
<a moz-do-not-send="true" href="mailto:khmer@lists.idyll.org" target="_blank">khmer@lists.idyll.org</a>
<a moz-do-not-send="true" href="http://lists.idyll.org/listinfo/khmer" target="_blank">http://lists.idyll.org/listinfo/khmer</a><span><font color="#888888">
</font></span></pre>
                                      <span><font color="#888888"> </font></span></blockquote>
                                    <span><font color="#888888"> <br>
                                        <div>-- <br>
                                          &lt;mime-attachment.png&gt;</div>
                                      </font></span></div>
                                  <br>
_______________________________________________<br>
                                  khmer mailing list<br>
                                  <a moz-do-not-send="true"
                                    href="mailto:khmer@lists.idyll.org"
                                    target="_blank">khmer@lists.idyll.org</a><br>
                                  <a moz-do-not-send="true"
                                    href="http://lists.idyll.org/listinfo/khmer"
                                    target="_blank">http://lists.idyll.org/listinfo/khmer</a><br>
                                  <br>
                                </blockquote>
                              </div>
                              <br>
                              <br clear="all">
                              <div><br>
                              </div>
                              -- <br>
                              <div dir="ltr">
                                <div>Eric McDonald</div>
                                <div>HPC/Cloud Software Engineer</div>
                                <div>  for the Institute for
                                  Cyber-Enabled Research (iCER)</div>
                                <div>  and the Laboratory for Genomics,
                                  Evolution, and Development (GED)</div>
                                <div>Michigan State University</div>
                                <div>P: <a moz-do-not-send="true"
                                    href="tel:517-355-8733"
                                    value="+15173558733" target="_blank">517-355-8733</a></div>
                              </div>
                            </div>
                          </blockquote>
                          <br>
                          <div>-- <br>
                            &lt;mime-attachment.png&gt;</div>
                        </blockquote>
                        <br>
                        <div>-- <br>
                          &lt;mime-attachment.png&gt;</div>
                      </blockquote>
                      <br>
                    </div>
                  </div>
                  <span class="HOEnZb"><font color="#888888">
                      <div>-- <br>
                        &lt;Signature_Mail_A_Groppi.png&gt;</div>
                    </font></span></div>
              </blockquote>
            </div>
            <br>
            <br clear="all">
            <div><br>
            </div>
            -- <br>
            <div dir="ltr">
              <div>Eric McDonald</div>
              <div>HPC/Cloud Software Engineer</div>
              <div>  for the Institute for Cyber-Enabled Research (iCER)</div>
              <div>  and the Laboratory for Genomics, Evolution, and
                Development (GED)</div>
              <div>Michigan State University</div>
              <div>P: 517-355-8733</div>
            </div>
          </div>
        </div>
      </blockquote>
      <blockquote type="cite">
        <div><span>_______________________________________________</span><br>
          <span>khmer mailing list</span><br>
          <span><a moz-do-not-send="true"
              href="mailto:khmer@lists.idyll.org">khmer@lists.idyll.org</a></span><br>
          <span><a moz-do-not-send="true"
              href="http://lists.idyll.org/listinfo/khmer">http://lists.idyll.org/listinfo/khmer</a></span><br>
        </div>
      </blockquote>
    </blockquote>
    <br>
    <div class="moz-signature">-- <br>
      <img src="cid:part22.07030007.02090608@u-bordeaux2.fr" border="0"></div>
  </body>
</html>