[khmer] Duration of do-partition.py (very long !) (Alexis Groppi)

Wed Mar 20 08:43:20 PDT 2013

Hi Eric,

Actually the previous job was terminated by the limit of the walltime.
I relaunched the script.
qstat -fr gives :
     resources_used.cput = 93:23:08
     resources_used.mem = 12341932kb
     resources_used.vmem = 13271372kb
     resources_used.walltime = 04:42:39

At this moment only the file.info has been generated.

Let's wait and see ...

Thanks again

Alexis

Le 19/03/2013 21:50, Eric McDonald a écrit :
> Hi Alexis,
>
> What does:
>   qstat -f <job-id>
> where <job-id> is the ID of your job tell you for the following fields:
>   resources_used.cput
>   resources_used.vmem
>
> And how do those values compare to actual amount of elapsed time for 
> the job, the amount of physical memory on the node, and the total 
> memory (RAM + swap space) on the node?
> Just checking to make sure that everything is running as it should be 
> and that your process is not heavily into swap or something like that.
>
> Thanks,
>   Eric
>
>
>
> On Tue, Mar 19, 2013 at 11:23 AM, Alexis Groppi 
> <alexis.groppi at u-bordeaux2.fr <mailto:alexis.groppi at u-bordeaux2.fr>> 
> wrote:
>
>     Hi Adina,
>
>     First of all thanks for your answer and your advices :)
>     The script extract-partitions.py works !
>     For the do-partition.py on my second set, it runs since 32 hours.
>     Should it not have produced at least one temporary .pmap file ?
>
>     Thanks again
>
>     Alexis
>
>     Le 19/03/2013 12:58, Adina Chuang Howe a écrit :
>>
>>
>>         Message: 1
>>         Date: Tue, 19 Mar 2013 10:41:45 +0100
>>         From: Alexis Groppi <alexis.groppi at u-bordeaux2.fr
>>         <mailto:alexis.groppi at u-bordeaux2.fr>>
>>         Subject: [khmer] Duration of do-partition.py (very long !)
>>         To: khmer at lists.idyll.org <mailto:khmer at lists.idyll.org>
>>         Message-ID: <514832D9.7090207 at u-bordeaux2.fr
>>         <mailto:514832D9.7090207 at u-bordeaux2.fr>>
>>         Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"
>>
>>         Hi Titus,
>>
>>         After digital normalization and filter-below-abund, upon your
>>         advice I
>>         performed do.partition.py <http://do.partition.py> on 2 sets
>>         of data (approx 2.5 millions of
>>         reads (75 nt)) :
>>
>>         /khmer-BETA/scripts/do-partition.py -k 20 -x 1e9
>>         /ag/khmer/Sample_174/174r1_prinseq_good_bFr8.fasta.keep.below.graphbase
>>         /ag/khmer/Sample_174/174r1_prinseq_good_bFr8.fasta.keep.below
>>         and
>>         /khmer-BETA/scripts/do-partition.py -k 20 -x 1e9
>>         /ag/khmer/Sample_174/174r2_prinseq_good_1lIQ.fasta.keep.below.graphbase
>>         /ag/khmer/Sample_174/174r2_prinseq_good_1lIQ.fasta.keep.below
>>
>>         For the first one I got a
>>         174r1_prinseq_good_bFr8.fasta.keep.below.graphbase.info
>>         <http://174r1_prinseq_good_bFr8.fasta.keep.below.graphbase.info>
>>         with the
>>         information : 33 subsets total
>>         Thereafter 33 files .pmap from 0.pmap to 32.pmap regurlarly
>>         were created
>>         and finally I got unique file
>>         174r1_prinseq_good_bFr8.fasta.keep.below.part (all the .pmap
>>         files were
>>         deleted)
>>         This treatment lasted approx 56 hours.
>>
>>         For the second set (174r2), do-partition.py is started since
>>         32 hours
>>         but I only got the
>>         174r2_prinseq_good_1lIQ.fasta.keep.below.graphbase.info
>>         <http://174r2_prinseq_good_1lIQ.fasta.keep.below.graphbase.info>
>>         with the
>>         information : 35 subsets total
>>         And nothing more...
>>
>>         Is this duration "normal" ?
>>
>>
>>     Yes, this is typical.  The longest I've had it run is 3 weeks for
>>     very large (billions of reads).  In general, partitioning is the
>>     most time consuming of all the steps.  Once its finished, you'll
>>     have much smaller files which can be assembled very quickly.
>>      Since I run assembly on multiple assembler and with multiple K
>>     lengths, this gain is often  significant for me.
>>
>>     To get the actual partitioned files, you can use the following
>>     script:
>>
>>     https://github.com/ged-lab/khmer/blob/master/scripts/extract-partitions.py
>>
>>         (The parameters for the threads are by default (4 threads))
>>         33 subsets and only one file at the end ?
>>         Should I stop do-partition.py on the second set and re run it
>>         with more
>>         threads ?
>>
>>
>>     I'd suggest letting it run.
>>
>>     Best,
>>     Adina
>>
>>
>>     _______________________________________________
>>     khmer mailing list
>>     khmer at lists.idyll.org  <mailto:khmer at lists.idyll.org>
>>     http://lists.idyll.org/listinfo/khmer
>
>     -- 
>
>     _______________________________________________
>     khmer mailing list
>     khmer at lists.idyll.org <mailto:khmer at lists.idyll.org>
>     http://lists.idyll.org/listinfo/khmer
>
>
>
>
> -- 
> Eric McDonald
> HPC/Cloud Software Engineer
>   for the Institute for Cyber-Enabled Research (iCER)
>   and the Laboratory for Genomics, Evolution, and Development (GED)
> Michigan State University
> P: 517-355-8733

-- 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20130320/a6bb9979/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 29033 bytes
Desc: not available
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20130320/a6bb9979/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Signature_Mail_A_Groppi.png
Type: image/png
Size: 29033 bytes
Desc: not available
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20130320/a6bb9979/attachment-0005.png>