<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    Sorry for bothering you, but it's not clear for me : <br>

    <br>

    For removing the artefacts : <br>

    Should I apply find-knots on my file.below ? (after

    normalize-by-median.py, load-into-counting.py and

    filter-below-abund.py)<br>

    Then filter-stoptags ?<br>

    And then will I have data ready for assembly or should I perform

    do-partition.py ? (on these artefact free data)<br>

    <br>

    Thanks<br>

    <br>

    Alexis

    <div class="moz-cite-prefix">Le 21/03/2013 15:28, C. Titus Brown a

      &eacute;crit&nbsp;:<br>

    </div>

    <blockquote cite="mid:20130321142820.GA30052@idyll.org" type="cite">

      <pre wrap="">On Thu, Mar 21, 2013 at 03:15:33PM +0100, Alexis Groppi wrote:

</pre>

      <blockquote type="cite">

        <pre wrap="">Thanks for your answer. The input file I use should not have this  

artefact because it comes after filter-below-abund treatment.

I will try with find-knots and then filter-stoptags.

For your last proposition : what is the size limit ?

Subsidiary question, Eric told me "Titus created a guide about what size  

hash table to generally use with certain kinds of data"

If possible I would be very interested to have this guide.

</pre>

      </blockquote>

      <pre wrap="">

<a class="moz-txt-link-freetext" href="http://khmer.readthedocs.org/en/latest/">http://khmer.readthedocs.org/en/latest/</a>

<a class="moz-txt-link-freetext" href="http://khmer.readthedocs.org/en/latest/choosing-hash-sizes.html">http://khmer.readthedocs.org/en/latest/choosing-hash-sizes.html</a>

OK, you may have to use the find-knots stuff --

<a class="moz-txt-link-freetext" href="http://khmer.readthedocs.org/en/latest/partitioning-big-data.html">http://khmer.readthedocs.org/en/latest/partitioning-big-data.html</a>

cheers,

--titus

</pre>

      <blockquote type="cite">

        <pre wrap="">Le 21/03/2013 14:14, C. Titus Brown a ??crit :

</pre>

        <blockquote type="cite">

          <pre wrap="">This long wait is probably a sign that you have a highly connected  

graph. We usually attribute that to the presence of sequencing  

artifacts, which have to be removed either via filter-below-abund or  

find-knot; do-partition can't do it itself.  Take a look at the  

handbook or the info on part large data.

In your case I think your data may be small enough to assemble just  

after diginorm.

---

C. Titus Brown, <a class="moz-txt-link-abbreviated" href="mailto:ctb@msu.edu">ctb@msu.edu</a> <a class="moz-txt-link-rfc2396E" href="mailto:ctb@msu.edu">&lt;mailto:ctb@msu.edu&gt;</a>

On Mar 21, 2013, at 8:50, Eric McDonald &lt;<a class="moz-txt-link-abbreviated" href="mailto:emcd.msu@gmail.com">emcd.msu@gmail.com</a>  

<a class="moz-txt-link-rfc2396E" href="mailto:emcd.msu@gmail.com">&lt;mailto:emcd.msu@gmail.com&gt;</a>&gt; wrote:

</pre>

          <blockquote type="cite">

            <pre wrap="">Thanks for the information, Alexis. If you are using 20 threads, then 

441 / 20 is about 22 hours of elapsed time. So, it appears that all  

of the threads are working. (There is the possibility that they could 

be busy-waiting somewhere, but I didn't see any explicit  

opportunities for that from reading the 'do-partition.py' code.)  

Since you haven't seen .pmap files yet and since multithreaded  

execution is occurring, I expect that execution is currently at the  

following place in the script:

<a class="moz-txt-link-freetext" href="https://github.com/ged-lab/khmer/blob/bleeding-edge/scripts/do-partition.py#L57">https://github.com/ged-lab/khmer/blob/bleeding-edge/scripts/do-partition.py#L57</a>

I am not familiar with the 'do_subset_partition' code, but will try  

to analyze it later today. However, I would also listen to what Adina 

is saying - this step may just take a long time....

Eric

P.S. If you want to check on the output from the script, you could  

look in /var/spool/PBS/mom_priv (or equivalent) on the node where the 

job is running to see what the spooled output looks like thus far.  

(There should be a file named with the job ID and either a ".ER" or  

".OU" extension, if I recall correctly, though it has been awhile  

since I have administered your kind of batch system.) You may need  

David to do this as the permissions to the directory are typically  

restrictive.

On Thu, Mar 21, 2013 at 5:40 AM, Alexis Groppi  

&lt;<a class="moz-txt-link-abbreviated" href="mailto:alexis.groppi@u-bordeaux2.fr">alexis.groppi@u-bordeaux2.fr</a> <a class="moz-txt-link-rfc2396E" href="mailto:alexis.groppi@u-bordeaux2.fr">&lt;mailto:alexis.groppi@u-bordeaux2.fr&gt;</a>&gt;  

wrote:

    A precision :

    The file submitted to the script do-partition.py contains 2576771

    reads (file.below)

    The job was launched with the following options :

    khmer-BETA/scripts/do-partition.py -k 20 -x 1e9 -T 20

    file.graphbase file.below

    Alexis

    Le 21/03/2013 10:13, Alexis Groppi a ??crit :

</pre>

            <blockquote type="cite">

              <pre wrap="">    Hi Eric,

    The script  do-partition.py is now running since 22 hours.

    Only the file.info <a class="moz-txt-link-rfc2396E" href="http://file.info">&lt;http://file.info&gt;</a> has been generated. No

    .pmap file were created.

    qstat -f gives :

        resources_used.cput = 441:04:21

        resources_used.mem = 12764228kb

        resources_used.vmem = 13926732kb

        resources_used.walltime = 22:05:56

    The amount of RAM on the server is 256 Go and the swap space is

    also 256 Go

    Your opinion ?

    Thanks

    Alexis

    Le 20/03/2013 16:43, Alexis Groppi a ??crit :

</pre>

              <blockquote type="cite">

                <pre wrap="">    Hi Eric,

    Actually the previous job was terminated by the limit of the

    walltime.

    I relaunched the script.

    qstat -fr gives :

        resources_used.cput = 93:23:08

        resources_used.mem = 12341932kb

        resources_used.vmem = 13271372kb

        resources_used.walltime = 04:42:39

    At this moment only the file.info <a class="moz-txt-link-rfc2396E" href="http://file.info">&lt;http://file.info&gt;</a> has been

    generated.

    Let's wait and see ...

    Thanks again

    Alexis

    Le 19/03/2013 21:50, Eric McDonald a ??crit :

</pre>

                <blockquote type="cite">

                  <pre wrap="">    Hi Alexis,

    What does:

      qstat -f &lt;job-id&gt;

    where &lt;job-id&gt; is the ID of your job tell you for the

    following fields:

      resources_used.cput

      resources_used.vmem

    And how do those values compare to actual amount of elapsed

    time for the job, the amount of physical memory on the node,

    and the total memory (RAM + swap space) on the node?

    Just checking to make sure that everything is running as it

    should be and that your process is not heavily into swap or

    something like that.

    Thanks,

      Eric

    On Tue, Mar 19, 2013 at 11:23 AM, Alexis Groppi

    &lt;<a class="moz-txt-link-abbreviated" href="mailto:alexis.groppi@u-bordeaux2.fr">alexis.groppi@u-bordeaux2.fr</a>

    <a class="moz-txt-link-rfc2396E" href="mailto:alexis.groppi@u-bordeaux2.fr">&lt;mailto:alexis.groppi@u-bordeaux2.fr&gt;</a>&gt; wrote:

        Hi Adina,

        First of all thanks for your answer and your advices :)

        The script extract-partitions.py works !

        For the do-partition.py on my second set, it runs since 32

        hours. Should it not have produced at least one temporary

        .pmap file ?

        Thanks again

        Alexis

        Le 19/03/2013 12:58, Adina Chuang Howe a ??crit :

</pre>

                  <blockquote type="cite">

                    <pre wrap="">

            Message: 1

            Date: Tue, 19 Mar 2013 10:41:45 +0100

            From: Alexis Groppi &lt;<a class="moz-txt-link-abbreviated" href="mailto:alexis.groppi@u-bordeaux2.fr">alexis.groppi@u-bordeaux2.fr</a>

            <a class="moz-txt-link-rfc2396E" href="mailto:alexis.groppi@u-bordeaux2.fr">&lt;mailto:alexis.groppi@u-bordeaux2.fr&gt;</a>&gt;

            Subject: [khmer] Duration of do-partition.py (very

            long !)

            To: <a class="moz-txt-link-abbreviated" href="mailto:khmer@lists.idyll.org">khmer@lists.idyll.org</a> <a class="moz-txt-link-rfc2396E" href="mailto:khmer@lists.idyll.org">&lt;mailto:khmer@lists.idyll.org&gt;</a>

            Message-ID: &lt;<a class="moz-txt-link-abbreviated" href="mailto:514832D9.7090207@u-bordeaux2.fr">514832D9.7090207@u-bordeaux2.fr</a>

            <a class="moz-txt-link-rfc2396E" href="mailto:514832D9.7090207@u-bordeaux2.fr">&lt;mailto:514832D9.7090207@u-bordeaux2.fr&gt;</a>&gt;

            Content-Type: text/plain; charset="iso-8859-1";

            Format="flowed"

            Hi Titus,

            After digital normalization and filter-below-abund,

            upon your advice I

            performed do.partition.py <a class="moz-txt-link-rfc2396E" href="http://do.partition.py">&lt;http://do.partition.py&gt;</a> on

            2 sets of data (approx 2.5 millions of

            reads (75 nt)) :

            /khmer-BETA/scripts/do-partition.py -k 20 -x 1e9

            /ag/khmer/Sample_174/174r1_prinseq_good_bFr8.fasta.keep.below.graphbase

            /ag/khmer/Sample_174/174r1_prinseq_good_bFr8.fasta.keep.below

            and

            /khmer-BETA/scripts/do-partition.py -k 20 -x 1e9

            /ag/khmer/Sample_174/174r2_prinseq_good_1lIQ.fasta.keep.below.graphbase

            /ag/khmer/Sample_174/174r2_prinseq_good_1lIQ.fasta.keep.below

            For the first one I got a

            174r1_prinseq_good_bFr8.fasta.keep.below.graphbase.info

            <a class="moz-txt-link-rfc2396E" href="http://174r1_prinseq_good_bFr8.fasta.keep.below.graphbase.info">&lt;http://174r1_prinseq_good_bFr8.fasta.keep.below.graphbase.info&gt;</a>

            with the

            information : 33 subsets total

            Thereafter 33 files .pmap from 0.pmap to 32.pmap

            regurlarly were created

            and finally I got unique file

            174r1_prinseq_good_bFr8.fasta.keep.below.part (all

            the .pmap files were

            deleted)

            This treatment lasted approx 56 hours.

            For the second set (174r2), do-partition.py is

            started since 32 hours

            but I only got the

            174r2_prinseq_good_1lIQ.fasta.keep.below.graphbase.info

            <a class="moz-txt-link-rfc2396E" href="http://174r2_prinseq_good_1lIQ.fasta.keep.below.graphbase.info">&lt;http://174r2_prinseq_good_1lIQ.fasta.keep.below.graphbase.info&gt;</a>

            with the

            information : 35 subsets total

            And nothing more...

            Is this duration "normal" ?

        Yes, this is typical.  The longest I've had it run is 3

        weeks for very large (billions of reads).  In general,

        partitioning is the most time consuming of all the steps.

         Once its finished, you'll have much smaller files which

        can be assembled very quickly.  Since I run assembly on

        multiple assembler and with multiple K lengths, this gain

        is often  significant for me.

        To get the actual partitioned files, you can use the

        following script:

        <a class="moz-txt-link-freetext" href="https://github.com/ged-lab/khmer/blob/master/scripts/extract-partitions.py">https://github.com/ged-lab/khmer/blob/master/scripts/extract-partitions.py</a>

            (The parameters for the threads are by default (4

            threads))

            33 subsets and only one file at the end ?

            Should I stop do-partition.py on the second set and

            re run it with more

            threads ?

        I'd suggest letting it run.

        Best,

        Adina

        _______________________________________________

        khmer mailing list

        <a class="moz-txt-link-abbreviated" href="mailto:khmer@lists.idyll.org">khmer@lists.idyll.org</a>  <a class="moz-txt-link-rfc2396E" href="mailto:khmer@lists.idyll.org">&lt;mailto:khmer@lists.idyll.org&gt;</a>

        <a class="moz-txt-link-freetext" href="http://lists.idyll.org/listinfo/khmer">http://lists.idyll.org/listinfo/khmer</a>

</pre>

                  </blockquote>

                  <pre wrap="">

        --         &lt;mime-attachment.png&gt;

        _______________________________________________

        khmer mailing list

        <a class="moz-txt-link-abbreviated" href="mailto:khmer@lists.idyll.org">khmer@lists.idyll.org</a> <a class="moz-txt-link-rfc2396E" href="mailto:khmer@lists.idyll.org">&lt;mailto:khmer@lists.idyll.org&gt;</a>

        <a class="moz-txt-link-freetext" href="http://lists.idyll.org/listinfo/khmer">http://lists.idyll.org/listinfo/khmer</a>

    --     Eric McDonald

    HPC/Cloud Software Engineer

      for the Institute for Cyber-Enabled Research (iCER)

      and the Laboratory for Genomics, Evolution, and Development

    (GED)

    Michigan State University

    P: 517-355-8733 &lt;tel:517-355-8733&gt;

</pre>

                </blockquote>

                <pre wrap="">

    --     &lt;mime-attachment.png&gt;

</pre>

              </blockquote>

              <pre wrap="">

    --     &lt;mime-attachment.png&gt;

</pre>

            </blockquote>

            <pre wrap="">

    --     &lt;Signature_Mail_A_Groppi.png&gt;

-- 

Eric McDonald

HPC/Cloud Software Engineer

  for the Institute for Cyber-Enabled Research (iCER)

  and the Laboratory for Genomics, Evolution, and Development (GED)

Michigan State University

P: 517-355-8733

_______________________________________________

khmer mailing list

<a class="moz-txt-link-abbreviated" href="mailto:khmer@lists.idyll.org">khmer@lists.idyll.org</a> <a class="moz-txt-link-rfc2396E" href="mailto:khmer@lists.idyll.org">&lt;mailto:khmer@lists.idyll.org&gt;</a>

<a class="moz-txt-link-freetext" href="http://lists.idyll.org/listinfo/khmer">http://lists.idyll.org/listinfo/khmer</a>

</pre>

          </blockquote>

        </blockquote>

        <pre wrap="">

-- 

</pre>

      </blockquote>

      <pre wrap="">

</pre>

      <blockquote type="cite">

        <pre wrap="">_______________________________________________

khmer mailing list

<a class="moz-txt-link-abbreviated" href="mailto:khmer@lists.idyll.org">khmer@lists.idyll.org</a>

<a class="moz-txt-link-freetext" href="http://lists.idyll.org/listinfo/khmer">http://lists.idyll.org/listinfo/khmer</a>

</pre>

      </blockquote>

      <pre wrap="">

</pre>

    </blockquote>

    <br>

    <div class="moz-signature">-- <br>

      <img src="cid:part1.07040105.02080306@u-bordeaux2.fr" border="0"></div>

  </body>

</html>