<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
Hi Titus,<br>
<br>
Thanks for your answer. The input file I use should not have this
artefact because it comes after filter-below-abund treatment.<br>
I will try with find-knots and then filter-stoptags.<br>
For your last proposition : what is the size limit ?<br>
Subsidiary question, Eric told me "Titus created a guide about what
size hash table to generally use with certain kinds of data"<br>
If possible I would be very interested to have this guide. <br>
<br>
Thanks again<br>
<br>
Alexis<br>
<br>
<div class="moz-cite-prefix">Le 21/03/2013 14:14, C. Titus Brown a
écrit :<br>
</div>
<blockquote cite="mid:78DDEAC6-43ED-4825-B61B-D57ABE904A05@msu.edu"
type="cite">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<div>This long wait is probably a sign that you have a highly
connected graph. We usually attribute that to the presence of
sequencing artifacts, which have to be removed either via
filter-below-abund or find-knot; do-partition can't do it
itself. Take a look at the handbook or the info on part large
data.</div>
<div><br>
</div>
<div>In your case I think your data may be small enough to
assemble just after diginorm.<br>
<br>
<div>---</div>
C. Titus Brown, <a moz-do-not-send="true"
href="mailto:ctb@msu.edu">ctb@msu.edu</a></div>
<div><br>
On Mar 21, 2013, at 8:50, Eric McDonald <<a
moz-do-not-send="true" href="mailto:emcd.msu@gmail.com">emcd.msu@gmail.com</a>>
wrote:<br>
<br>
</div>
<blockquote type="cite">
<div>
<div dir="ltr">Thanks for the information, Alexis. If you are
using 20 threads, then 441 / 20 is about 22 hours of elapsed
time. So, it appears that all of the threads are working.
(There is the possibility that they could be busy-waiting
somewhere, but I didn't see any explicit opportunities for
that from reading the 'do-partition.py' code.) Since you
haven't seen .pmap files yet and since multithreaded
execution is occurring, I expect that execution is currently
at the following place in the script:
<div>
<a moz-do-not-send="true"
href="https://github.com/ged-lab/khmer/blob/bleeding-edge/scripts/do-partition.py#L57">https://github.com/ged-lab/khmer/blob/bleeding-edge/scripts/do-partition.py#L57</a></div>
<div><br>
</div>
<div style="">I am not familiar with the
'do_subset_partition' code, but will try to analyze it
later today. However, I would also listen to what Adina is
saying - this step may just take a long time....</div>
<div style=""><br>
</div>
<div style="">Eric</div>
<div style=""><br>
</div>
<div style="">P.S. If you want to check on the output from
the script, you could look in /var/spool/PBS/mom_priv (or
equivalent) on the node where the job is running to see
what the spooled output looks like thus far. (There should
be a file named with the job ID and either a ".ER" or
".OU" extension, if I recall correctly, though it has been
awhile since I have administered your kind of batch
system.) You may need David to do this as the permissions
to the directory are typically restrictive.</div>
<div><br>
</div>
</div>
<div class="gmail_extra"><br>
<br>
<div class="gmail_quote">On Thu, Mar 21, 2013 at 5:40 AM,
Alexis Groppi <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:alexis.groppi@u-bordeaux2.fr"
target="_blank">alexis.groppi@u-bordeaux2.fr</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF"> A precision : <br>
<br>
The file submitted to the script do-partition.py
contains 2576771 reads (file.below)<br>
The job was launched with the following options : <br>
khmer-BETA/scripts/do-partition.py -k 20 -x 1e9 -T 20
file.graphbase file.below<br>
<br>
Alexis<br>
<br>
<br>
<div>Le 21/03/2013 10:13, Alexis Groppi a écrit :<br>
</div>
<div>
<div class="h5">
<blockquote type="cite"> Hi Eric,<br>
<br>
The script do-partition.py is now running since
22 hours.<br>
Only the <a moz-do-not-send="true"
href="http://file.info" target="_blank">file.info</a>
has been generated. No .pmap file were created.<br>
<br>
qstat -f gives :<br>
resources_used.cput = 441:04:21<br>
resources_used.mem = 12764228kb<br>
resources_used.vmem = 13926732kb<br>
resources_used.walltime = 22:05:56<br>
<br>
The amount of RAM on the server is 256 Go and
the swap space is also 256 Go<br>
<br>
Your opinion ?<br>
<br>
Thanks<br>
<br>
Alexis<br>
<br>
<div>Le 20/03/2013 16:43, Alexis Groppi a
écrit :<br>
</div>
<blockquote type="cite"> Hi Eric,<br>
<br>
Actually the previous job was terminated by
the limit of the walltime.<br>
I relaunched the script.<br>
qstat -fr gives : <br>
resources_used.cput = 93:23:08<br>
resources_used.mem = 12341932kb<br>
resources_used.vmem = 13271372kb<br>
resources_used.walltime = 04:42:39<br>
<br>
At this moment only the <a
moz-do-not-send="true"
href="http://file.info" target="_blank">file.info</a>
has been generated.<br>
<br>
Let's wait and see ...<br>
<br>
Thanks again<br>
<br>
Alexis<br>
<br>
<br>
<div>Le 19/03/2013 21:50, Eric McDonald a
écrit :<br>
</div>
<blockquote type="cite">
<div dir="ltr">Hi Alexis,
<div><br>
</div>
<div>What does:</div>
<div> qstat -f <job-id></div>
<div>where <job-id> is the ID of
your job tell you for the following
fields:</div>
<div> resources_used.cput</div>
<div> resources_used.vmem</div>
<div><br>
</div>
<div>And how do those values compare to
actual amount of elapsed time for the
job, the amount of physical memory on
the node, and the total memory (RAM +
swap space) on the node?</div>
<div>Just checking to make sure that
everything is running as it should be
and that your process is not heavily
into swap or something like that.</div>
<div><br>
</div>
<div>Thanks,</div>
<div> Eric</div>
<div><br>
</div>
</div>
<div class="gmail_extra"><br>
<br>
<div class="gmail_quote">On Tue, Mar 19,
2013 at 11:23 AM, Alexis Groppi <span
dir="ltr"><<a
moz-do-not-send="true"
href="mailto:alexis.groppi@u-bordeaux2.fr"
target="_blank">alexis.groppi@u-bordeaux2.fr</a>></span>
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
Hi Adina,<br>
<br>
First of all thanks for your answer
and your advices :)<br>
The script extract-partitions.py
works !<br>
For the do-partition.py on my second
set, it runs since 32 hours. Should
it not have produced at least one
temporary .pmap file ?<br>
<br>
Thanks again<br>
<br>
Alexis<br>
<br>
<div>Le 19/03/2013 12:58, Adina
Chuang Howe a écrit :<br>
</div>
<blockquote type="cite">
<div>
<div><br>
<br>
<div class="gmail_quote">
<blockquote
class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
Message: 1<br>
Date: Tue, 19 Mar 2013
10:41:45 +0100<br>
From: Alexis Groppi <<a
moz-do-not-send="true"
href="mailto:alexis.groppi@u-bordeaux2.fr"
target="_blank">alexis.groppi@u-bordeaux2.fr</a>><br>
Subject: [khmer] Duration
of do-partition.py (very
long !)<br>
To: <a
moz-do-not-send="true"
href="mailto:khmer@lists.idyll.org"
target="_blank">khmer@lists.idyll.org</a><br>
Message-ID: <<a
moz-do-not-send="true"
href="mailto:514832D9.7090207@u-bordeaux2.fr"
target="_blank">514832D9.7090207@u-bordeaux2.fr</a>><br>
Content-Type: text/plain;
charset="iso-8859-1";
Format="flowed"<br>
<br>
Hi Titus,<br>
<br>
After digital
normalization and
filter-below-abund, upon
your advice I<br>
performed <a
moz-do-not-send="true"
href="http://do.partition.py"
target="_blank">do.partition.py</a>
on 2 sets of data (approx
2.5 millions of<br>
reads (75 nt)) :<br>
<br>
/khmer-BETA/scripts/do-partition.py
-k 20 -x 1e9<br>
/ag/khmer/Sample_174/174r1_prinseq_good_bFr8.fasta.keep.below.graphbase<br>
/ag/khmer/Sample_174/174r1_prinseq_good_bFr8.fasta.keep.below<br>
and<br>
/khmer-BETA/scripts/do-partition.py
-k 20 -x 1e9<br>
/ag/khmer/Sample_174/174r2_prinseq_good_1lIQ.fasta.keep.below.graphbase<br>
/ag/khmer/Sample_174/174r2_prinseq_good_1lIQ.fasta.keep.below<br>
<br>
For the first one I got a<br>
<a moz-do-not-send="true"
href="http://174r1_prinseq_good_bFr8.fasta.keep.below.graphbase.info"
target="_blank">174r1_prinseq_good_bFr8.fasta.keep.below.graphbase.info</a>
with the<br>
information : 33 subsets
total<br>
Thereafter 33 files .pmap
from 0.pmap to 32.pmap
regurlarly were created<br>
and finally I got unique
file<br>
174r1_prinseq_good_bFr8.fasta.keep.below.part
(all the .pmap files were<br>
deleted)<br>
This treatment lasted
approx 56 hours.<br>
<br>
For the second set
(174r2), do-partition.py
is started since 32 hours<br>
but I only got the<br>
<a moz-do-not-send="true"
href="http://174r2_prinseq_good_1lIQ.fasta.keep.below.graphbase.info"
target="_blank">174r2_prinseq_good_1lIQ.fasta.keep.below.graphbase.info</a>
with the<br>
information : 35 subsets
total<br>
And nothing more...<br>
<br>
Is this duration "normal"
?<br>
</blockquote>
<div><br>
</div>
<div>Yes, this is typical.
The longest I've had it
run is 3 weeks for very
large (billions of reads).
In general, partitioning
is the most time consuming
of all the steps. Once
its finished, you'll have
much smaller files which
can be assembled very
quickly. Since I run
assembly on multiple
assembler and with
multiple K lengths, this
gain is often significant
for me. </div>
<div><br>
</div>
<div>To get the actual
partitioned files, you can
use the following script:</div>
<div><br>
</div>
<div><a
moz-do-not-send="true"
href="https://github.com/ged-lab/khmer/blob/master/scripts/extract-partitions.py"
target="_blank">https://github.com/ged-lab/khmer/blob/master/scripts/extract-partitions.py</a></div>
<div><br>
</div>
<blockquote
class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">
(The parameters for the
threads are by default (4
threads))<br>
33 subsets and only one
file at the end ?<br>
Should I stop
do-partition.py on the
second set and re run it
with more<br>
threads ?<br>
<br>
</blockquote>
<div><br>
</div>
<div>I'd suggest letting it
run.</div>
<div><br>
</div>
<div>Best,</div>
<div>Adina</div>
</div>
<br>
<fieldset></fieldset>
<br>
</div>
</div>
<pre>_______________________________________________
khmer mailing list
<a moz-do-not-send="true" href="mailto:khmer@lists.idyll.org" target="_blank">khmer@lists.idyll.org</a>
<a moz-do-not-send="true" href="http://lists.idyll.org/listinfo/khmer" target="_blank">http://lists.idyll.org/listinfo/khmer</a><span><font color="#888888">
</font></span></pre>
<span><font color="#888888"> </font></span></blockquote>
<span><font color="#888888"> <br>
<div>-- <br>
<mime-attachment.png></div>
</font></span></div>
<br>
_______________________________________________<br>
khmer mailing list<br>
<a moz-do-not-send="true"
href="mailto:khmer@lists.idyll.org"
target="_blank">khmer@lists.idyll.org</a><br>
<a moz-do-not-send="true"
href="http://lists.idyll.org/listinfo/khmer"
target="_blank">http://lists.idyll.org/listinfo/khmer</a><br>
<br>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
-- <br>
<div dir="ltr">
<div>Eric McDonald</div>
<div>HPC/Cloud Software Engineer</div>
<div> for the Institute for
Cyber-Enabled Research (iCER)</div>
<div> and the Laboratory for Genomics,
Evolution, and Development (GED)</div>
<div>Michigan State University</div>
<div>P: <a moz-do-not-send="true"
href="tel:517-355-8733"
value="+15173558733" target="_blank">517-355-8733</a></div>
</div>
</div>
</blockquote>
<br>
<div>-- <br>
<mime-attachment.png></div>
</blockquote>
<br>
<div>-- <br>
<mime-attachment.png></div>
</blockquote>
<br>
</div>
</div>
<span class="HOEnZb"><font color="#888888">
<div>-- <br>
<Signature_Mail_A_Groppi.png></div>
</font></span></div>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
-- <br>
<div dir="ltr">
<div>Eric McDonald</div>
<div>HPC/Cloud Software Engineer</div>
<div> for the Institute for Cyber-Enabled Research (iCER)</div>
<div> and the Laboratory for Genomics, Evolution, and
Development (GED)</div>
<div>Michigan State University</div>
<div>P: 517-355-8733</div>
</div>
</div>
</div>
</blockquote>
<blockquote type="cite">
<div><span>_______________________________________________</span><br>
<span>khmer mailing list</span><br>
<span><a moz-do-not-send="true"
href="mailto:khmer@lists.idyll.org">khmer@lists.idyll.org</a></span><br>
<span><a moz-do-not-send="true"
href="http://lists.idyll.org/listinfo/khmer">http://lists.idyll.org/listinfo/khmer</a></span><br>
</div>
</blockquote>
</blockquote>
<br>
<div class="moz-signature">-- <br>
<img src="cid:part22.07030007.02090608@u-bordeaux2.fr" border="0"></div>
</body>
</html>