<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
Sorry for bothering you, but it's not clear for me : <br>
<br>
For removing the artefacts : <br>
Should I apply find-knots on my file.below ? (after
normalize-by-median.py, load-into-counting.py and
filter-below-abund.py)<br>
Then filter-stoptags ?<br>
And then will I have data ready for assembly or should I perform
do-partition.py ? (on these artefact free data)<br>
<br>
Thanks<br>
<br>
Alexis
<div class="moz-cite-prefix">Le 21/03/2013 15:28, C. Titus Brown a
écrit :<br>
</div>
<blockquote cite="mid:20130321142820.GA30052@idyll.org" type="cite">
<pre wrap="">On Thu, Mar 21, 2013 at 03:15:33PM +0100, Alexis Groppi wrote:
</pre>
<blockquote type="cite">
<pre wrap="">Thanks for your answer. The input file I use should not have this
artefact because it comes after filter-below-abund treatment.
I will try with find-knots and then filter-stoptags.
For your last proposition : what is the size limit ?
Subsidiary question, Eric told me "Titus created a guide about what size
hash table to generally use with certain kinds of data"
If possible I would be very interested to have this guide.
</pre>
</blockquote>
<pre wrap="">
<a class="moz-txt-link-freetext" href="http://khmer.readthedocs.org/en/latest/">http://khmer.readthedocs.org/en/latest/</a>
<a class="moz-txt-link-freetext" href="http://khmer.readthedocs.org/en/latest/choosing-hash-sizes.html">http://khmer.readthedocs.org/en/latest/choosing-hash-sizes.html</a>
OK, you may have to use the find-knots stuff --
<a class="moz-txt-link-freetext" href="http://khmer.readthedocs.org/en/latest/partitioning-big-data.html">http://khmer.readthedocs.org/en/latest/partitioning-big-data.html</a>
cheers,
--titus
</pre>
<blockquote type="cite">
<pre wrap="">Le 21/03/2013 14:14, C. Titus Brown a ??crit :
</pre>
<blockquote type="cite">
<pre wrap="">This long wait is probably a sign that you have a highly connected
graph. We usually attribute that to the presence of sequencing
artifacts, which have to be removed either via filter-below-abund or
find-knot; do-partition can't do it itself. Take a look at the
handbook or the info on part large data.
In your case I think your data may be small enough to assemble just
after diginorm.
---
C. Titus Brown, <a class="moz-txt-link-abbreviated" href="mailto:ctb@msu.edu">ctb@msu.edu</a> <a class="moz-txt-link-rfc2396E" href="mailto:ctb@msu.edu"><mailto:ctb@msu.edu></a>
On Mar 21, 2013, at 8:50, Eric McDonald <<a class="moz-txt-link-abbreviated" href="mailto:emcd.msu@gmail.com">emcd.msu@gmail.com</a>
<a class="moz-txt-link-rfc2396E" href="mailto:emcd.msu@gmail.com"><mailto:emcd.msu@gmail.com></a>> wrote:
</pre>
<blockquote type="cite">
<pre wrap="">Thanks for the information, Alexis. If you are using 20 threads, then
441 / 20 is about 22 hours of elapsed time. So, it appears that all
of the threads are working. (There is the possibility that they could
be busy-waiting somewhere, but I didn't see any explicit
opportunities for that from reading the 'do-partition.py' code.)
Since you haven't seen .pmap files yet and since multithreaded
execution is occurring, I expect that execution is currently at the
following place in the script:
<a class="moz-txt-link-freetext" href="https://github.com/ged-lab/khmer/blob/bleeding-edge/scripts/do-partition.py#L57">https://github.com/ged-lab/khmer/blob/bleeding-edge/scripts/do-partition.py#L57</a>
I am not familiar with the 'do_subset_partition' code, but will try
to analyze it later today. However, I would also listen to what Adina
is saying - this step may just take a long time....
Eric
P.S. If you want to check on the output from the script, you could
look in /var/spool/PBS/mom_priv (or equivalent) on the node where the
job is running to see what the spooled output looks like thus far.
(There should be a file named with the job ID and either a ".ER" or
".OU" extension, if I recall correctly, though it has been awhile
since I have administered your kind of batch system.) You may need
David to do this as the permissions to the directory are typically
restrictive.
On Thu, Mar 21, 2013 at 5:40 AM, Alexis Groppi
<<a class="moz-txt-link-abbreviated" href="mailto:alexis.groppi@u-bordeaux2.fr">alexis.groppi@u-bordeaux2.fr</a> <a class="moz-txt-link-rfc2396E" href="mailto:alexis.groppi@u-bordeaux2.fr"><mailto:alexis.groppi@u-bordeaux2.fr></a>>
wrote:
A precision :
The file submitted to the script do-partition.py contains 2576771
reads (file.below)
The job was launched with the following options :
khmer-BETA/scripts/do-partition.py -k 20 -x 1e9 -T 20
file.graphbase file.below
Alexis
Le 21/03/2013 10:13, Alexis Groppi a ??crit :
</pre>
<blockquote type="cite">
<pre wrap=""> Hi Eric,
The script do-partition.py is now running since 22 hours.
Only the file.info <a class="moz-txt-link-rfc2396E" href="http://file.info"><http://file.info></a> has been generated. No
.pmap file were created.
qstat -f gives :
resources_used.cput = 441:04:21
resources_used.mem = 12764228kb
resources_used.vmem = 13926732kb
resources_used.walltime = 22:05:56
The amount of RAM on the server is 256 Go and the swap space is
also 256 Go
Your opinion ?
Thanks
Alexis
Le 20/03/2013 16:43, Alexis Groppi a ??crit :
</pre>
<blockquote type="cite">
<pre wrap=""> Hi Eric,
Actually the previous job was terminated by the limit of the
walltime.
I relaunched the script.
qstat -fr gives :
resources_used.cput = 93:23:08
resources_used.mem = 12341932kb
resources_used.vmem = 13271372kb
resources_used.walltime = 04:42:39
At this moment only the file.info <a class="moz-txt-link-rfc2396E" href="http://file.info"><http://file.info></a> has been
generated.
Let's wait and see ...
Thanks again
Alexis
Le 19/03/2013 21:50, Eric McDonald a ??crit :
</pre>
<blockquote type="cite">
<pre wrap=""> Hi Alexis,
What does:
qstat -f <job-id>
where <job-id> is the ID of your job tell you for the
following fields:
resources_used.cput
resources_used.vmem
And how do those values compare to actual amount of elapsed
time for the job, the amount of physical memory on the node,
and the total memory (RAM + swap space) on the node?
Just checking to make sure that everything is running as it
should be and that your process is not heavily into swap or
something like that.
Thanks,
Eric
On Tue, Mar 19, 2013 at 11:23 AM, Alexis Groppi
<<a class="moz-txt-link-abbreviated" href="mailto:alexis.groppi@u-bordeaux2.fr">alexis.groppi@u-bordeaux2.fr</a>
<a class="moz-txt-link-rfc2396E" href="mailto:alexis.groppi@u-bordeaux2.fr"><mailto:alexis.groppi@u-bordeaux2.fr></a>> wrote:
Hi Adina,
First of all thanks for your answer and your advices :)
The script extract-partitions.py works !
For the do-partition.py on my second set, it runs since 32
hours. Should it not have produced at least one temporary
.pmap file ?
Thanks again
Alexis
Le 19/03/2013 12:58, Adina Chuang Howe a ??crit :
</pre>
<blockquote type="cite">
<pre wrap="">
Message: 1
Date: Tue, 19 Mar 2013 10:41:45 +0100
From: Alexis Groppi <<a class="moz-txt-link-abbreviated" href="mailto:alexis.groppi@u-bordeaux2.fr">alexis.groppi@u-bordeaux2.fr</a>
<a class="moz-txt-link-rfc2396E" href="mailto:alexis.groppi@u-bordeaux2.fr"><mailto:alexis.groppi@u-bordeaux2.fr></a>>
Subject: [khmer] Duration of do-partition.py (very
long !)
To: <a class="moz-txt-link-abbreviated" href="mailto:khmer@lists.idyll.org">khmer@lists.idyll.org</a> <a class="moz-txt-link-rfc2396E" href="mailto:khmer@lists.idyll.org"><mailto:khmer@lists.idyll.org></a>
Message-ID: <<a class="moz-txt-link-abbreviated" href="mailto:514832D9.7090207@u-bordeaux2.fr">514832D9.7090207@u-bordeaux2.fr</a>
<a class="moz-txt-link-rfc2396E" href="mailto:514832D9.7090207@u-bordeaux2.fr"><mailto:514832D9.7090207@u-bordeaux2.fr></a>>
Content-Type: text/plain; charset="iso-8859-1";
Format="flowed"
Hi Titus,
After digital normalization and filter-below-abund,
upon your advice I
performed do.partition.py <a class="moz-txt-link-rfc2396E" href="http://do.partition.py"><http://do.partition.py></a> on
2 sets of data (approx 2.5 millions of
reads (75 nt)) :
/khmer-BETA/scripts/do-partition.py -k 20 -x 1e9
/ag/khmer/Sample_174/174r1_prinseq_good_bFr8.fasta.keep.below.graphbase
/ag/khmer/Sample_174/174r1_prinseq_good_bFr8.fasta.keep.below
and
/khmer-BETA/scripts/do-partition.py -k 20 -x 1e9
/ag/khmer/Sample_174/174r2_prinseq_good_1lIQ.fasta.keep.below.graphbase
/ag/khmer/Sample_174/174r2_prinseq_good_1lIQ.fasta.keep.below
For the first one I got a
174r1_prinseq_good_bFr8.fasta.keep.below.graphbase.info
<a class="moz-txt-link-rfc2396E" href="http://174r1_prinseq_good_bFr8.fasta.keep.below.graphbase.info"><http://174r1_prinseq_good_bFr8.fasta.keep.below.graphbase.info></a>
with the
information : 33 subsets total
Thereafter 33 files .pmap from 0.pmap to 32.pmap
regurlarly were created
and finally I got unique file
174r1_prinseq_good_bFr8.fasta.keep.below.part (all
the .pmap files were
deleted)
This treatment lasted approx 56 hours.
For the second set (174r2), do-partition.py is
started since 32 hours
but I only got the
174r2_prinseq_good_1lIQ.fasta.keep.below.graphbase.info
<a class="moz-txt-link-rfc2396E" href="http://174r2_prinseq_good_1lIQ.fasta.keep.below.graphbase.info"><http://174r2_prinseq_good_1lIQ.fasta.keep.below.graphbase.info></a>
with the
information : 35 subsets total
And nothing more...
Is this duration "normal" ?
Yes, this is typical. The longest I've had it run is 3
weeks for very large (billions of reads). In general,
partitioning is the most time consuming of all the steps.
Once its finished, you'll have much smaller files which
can be assembled very quickly. Since I run assembly on
multiple assembler and with multiple K lengths, this gain
is often significant for me.
To get the actual partitioned files, you can use the
following script:
<a class="moz-txt-link-freetext" href="https://github.com/ged-lab/khmer/blob/master/scripts/extract-partitions.py">https://github.com/ged-lab/khmer/blob/master/scripts/extract-partitions.py</a>
(The parameters for the threads are by default (4
threads))
33 subsets and only one file at the end ?
Should I stop do-partition.py on the second set and
re run it with more
threads ?
I'd suggest letting it run.
Best,
Adina
_______________________________________________
khmer mailing list
<a class="moz-txt-link-abbreviated" href="mailto:khmer@lists.idyll.org">khmer@lists.idyll.org</a> <a class="moz-txt-link-rfc2396E" href="mailto:khmer@lists.idyll.org"><mailto:khmer@lists.idyll.org></a>
<a class="moz-txt-link-freetext" href="http://lists.idyll.org/listinfo/khmer">http://lists.idyll.org/listinfo/khmer</a>
</pre>
</blockquote>
<pre wrap="">
-- <mime-attachment.png>
_______________________________________________
khmer mailing list
<a class="moz-txt-link-abbreviated" href="mailto:khmer@lists.idyll.org">khmer@lists.idyll.org</a> <a class="moz-txt-link-rfc2396E" href="mailto:khmer@lists.idyll.org"><mailto:khmer@lists.idyll.org></a>
<a class="moz-txt-link-freetext" href="http://lists.idyll.org/listinfo/khmer">http://lists.idyll.org/listinfo/khmer</a>
-- Eric McDonald
HPC/Cloud Software Engineer
for the Institute for Cyber-Enabled Research (iCER)
and the Laboratory for Genomics, Evolution, and Development
(GED)
Michigan State University
P: 517-355-8733 <tel:517-355-8733>
</pre>
</blockquote>
<pre wrap="">
-- <mime-attachment.png>
</pre>
</blockquote>
<pre wrap="">
-- <mime-attachment.png>
</pre>
</blockquote>
<pre wrap="">
-- <Signature_Mail_A_Groppi.png>
--
Eric McDonald
HPC/Cloud Software Engineer
for the Institute for Cyber-Enabled Research (iCER)
and the Laboratory for Genomics, Evolution, and Development (GED)
Michigan State University
P: 517-355-8733
_______________________________________________
khmer mailing list
<a class="moz-txt-link-abbreviated" href="mailto:khmer@lists.idyll.org">khmer@lists.idyll.org</a> <a class="moz-txt-link-rfc2396E" href="mailto:khmer@lists.idyll.org"><mailto:khmer@lists.idyll.org></a>
<a class="moz-txt-link-freetext" href="http://lists.idyll.org/listinfo/khmer">http://lists.idyll.org/listinfo/khmer</a>
</pre>
</blockquote>
</blockquote>
<pre wrap="">
--
</pre>
</blockquote>
<pre wrap="">
</pre>
<blockquote type="cite">
<pre wrap="">_______________________________________________
khmer mailing list
<a class="moz-txt-link-abbreviated" href="mailto:khmer@lists.idyll.org">khmer@lists.idyll.org</a>
<a class="moz-txt-link-freetext" href="http://lists.idyll.org/listinfo/khmer">http://lists.idyll.org/listinfo/khmer</a>
</pre>
</blockquote>
<pre wrap="">
</pre>
</blockquote>
<br>
<div class="moz-signature">-- <br>
<img src="cid:part1.07040105.02080306@u-bordeaux2.fr" border="0"></div>
</body>
</html>