<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    Hi Adina,<br>

    <br>

    First of all thanks for your answer and your advices :)<br>

    The script extract-partitions.py works !<br>

    For the do-partition.py on my second set, it runs since 32 hours.

    Should it not have produced at least one temporary .pmap file ?<br>

    <br>

    Thanks again<br>

    <br>

    Alexis<br>

    <br>

    <div class="moz-cite-prefix">Le 19/03/2013 12:58, Adina Chuang Howe

      a &eacute;crit&nbsp;:<br>

    </div>

    <blockquote

cite="mid:CAO-C1xUHSWEaRMZHoaS3UZBfs67zeo7TOfi9pab+_rcNixQMRA@mail.gmail.com"

      type="cite"><br>

      <br>

      <div class="gmail_quote">

        <blockquote class="gmail_quote" style="margin:0 0 0

          .8ex;border-left:1px #ccc solid;padding-left:1ex">

          Message: 1<br>

          Date: Tue, 19 Mar 2013 10:41:45 +0100<br>

          From: Alexis Groppi &lt;<a moz-do-not-send="true"

            href="mailto:alexis.groppi@u-bordeaux2.fr">alexis.groppi@u-bordeaux2.fr</a>&gt;<br>

          Subject: [khmer] Duration of do-partition.py (very long !)<br>

          To: <a moz-do-not-send="true"

            href="mailto:khmer@lists.idyll.org">khmer@lists.idyll.org</a><br>

          Message-ID: &lt;<a moz-do-not-send="true"

            href="mailto:514832D9.7090207@u-bordeaux2.fr">514832D9.7090207@u-bordeaux2.fr</a>&gt;<br>

          Content-Type: text/plain; charset="iso-8859-1";

          Format="flowed"<br>

          <br>

          Hi Titus,<br>

          <br>

          After digital normalization and filter-below-abund, upon your

          advice I<br>

          performed <a moz-do-not-send="true"

            href="http://do.partition.py" target="_blank">do.partition.py</a>

          on 2 sets of data (approx 2.5 millions of<br>

          reads (75 nt)) :<br>

          <br>

          /khmer-BETA/scripts/do-partition.py -k 20 -x 1e9<br>

/ag/khmer/Sample_174/174r1_prinseq_good_bFr8.fasta.keep.below.graphbase<br>

          /ag/khmer/Sample_174/174r1_prinseq_good_bFr8.fasta.keep.below<br>

          and<br>

          /khmer-BETA/scripts/do-partition.py -k 20 -x 1e9<br>

/ag/khmer/Sample_174/174r2_prinseq_good_1lIQ.fasta.keep.below.graphbase<br>

          /ag/khmer/Sample_174/174r2_prinseq_good_1lIQ.fasta.keep.below<br>

          <br>

          For the first one I got a<br>

          <a moz-do-not-send="true"

            href="http://174r1_prinseq_good_bFr8.fasta.keep.below.graphbase.info"

            target="_blank">174r1_prinseq_good_bFr8.fasta.keep.below.graphbase.info</a>

          with the<br>

          information : 33 subsets total<br>

          Thereafter 33 files .pmap from 0.pmap to 32.pmap regurlarly

          were created<br>

          and finally I got unique file<br>

          174r1_prinseq_good_bFr8.fasta.keep.below.part (all the .pmap

          files were<br>

          deleted)<br>

          This treatment lasted approx 56 hours.<br>

          <br>

          For the second set (174r2), do-partition.py is started since

          32 hours<br>

          but I only got the<br>

          <a moz-do-not-send="true"

            href="http://174r2_prinseq_good_1lIQ.fasta.keep.below.graphbase.info"

            target="_blank">174r2_prinseq_good_1lIQ.fasta.keep.below.graphbase.info</a>

          with the<br>

          information : 35 subsets total<br>

          And nothing more...<br>

          <br>

          Is this duration "normal" ?<br>

        </blockquote>

        <div><br>

        </div>

        <div>Yes, this is typical. &nbsp;The longest I've had it run is 3

          weeks for very large (billions of reads). &nbsp;In general,

          partitioning is the most time consuming of all the steps.

          &nbsp;Once its finished, you'll have much smaller files which can

          be assembled very quickly. &nbsp;Since I run assembly on multiple

          assembler and with multiple K lengths, this gain is often

          &nbsp;significant for me. &nbsp;</div>

        <div><br>

        </div>

        <div>To get the actual partitioned files, you can use the

          following script:</div>

        <div><br>

        </div>

        <div><a moz-do-not-send="true"

href="https://github.com/ged-lab/khmer/blob/master/scripts/extract-partitions.py">https://github.com/ged-lab/khmer/blob/master/scripts/extract-partitions.py</a></div>

        <div><br>

        </div>

        <blockquote class="gmail_quote" style="margin:0 0 0

          .8ex;border-left:1px #ccc solid;padding-left:1ex">

          (The parameters for the threads are by default (4 threads))<br>

          33 subsets and only one file at the end ?<br>

          Should I stop do-partition.py on the second set and re run it

          with more<br>

          threads ?<br>

          <br>

        </blockquote>

        <div><br>

        </div>

        <div>I'd suggest letting it run.</div>

        <div><br>

        </div>

        <div>Best,</div>

        <div>Adina</div>

      </div>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

khmer mailing list

<a class="moz-txt-link-abbreviated" href="mailto:khmer@lists.idyll.org">khmer@lists.idyll.org</a>

<a class="moz-txt-link-freetext" href="http://lists.idyll.org/listinfo/khmer">http://lists.idyll.org/listinfo/khmer</a>

</pre>

    </blockquote>

    <br>

    <div class="moz-signature">-- <br>

      <img src="cid:part8.09000408.00020801@u-bordeaux2.fr" border="0"></div>

  </body>

</html>