[khmer] Duration of do-partition.py (very long !) (Alexis Groppi)

Alexis Groppi alexis.groppi at u-bordeaux2.fr
Thu Mar 21 02:13:30 PDT 2013


Hi Eric,

The script  do-partition.py is now running since 22 hours.
Only the file.info has been generated. No .pmap file were created.

qstat -f gives :
     resources_used.cput = 441:04:21
     resources_used.mem = 12764228kb
     resources_used.vmem = 13926732kb
     resources_used.walltime = 22:05:56

The amount of RAM on the server is 256 Go and the swap space is also 256 Go

Your opinion ?

Thanks

Alexis

Le 20/03/2013 16:43, Alexis Groppi a écrit :
> Hi Eric,
>
> Actually the previous job was terminated by the limit of the walltime.
> I relaunched the script.
> qstat -fr gives :
>     resources_used.cput = 93:23:08
>     resources_used.mem = 12341932kb
>     resources_used.vmem = 13271372kb
>     resources_used.walltime = 04:42:39
>
> At this moment only the file.info has been generated.
>
> Let's wait and see ...
>
> Thanks again
>
> Alexis
>
>
> Le 19/03/2013 21:50, Eric McDonald a écrit :
>> Hi Alexis,
>>
>> What does:
>>   qstat -f <job-id>
>> where <job-id> is the ID of your job tell you for the following fields:
>>   resources_used.cput
>>   resources_used.vmem
>>
>> And how do those values compare to actual amount of elapsed time for 
>> the job, the amount of physical memory on the node, and the total 
>> memory (RAM + swap space) on the node?
>> Just checking to make sure that everything is running as it should be 
>> and that your process is not heavily into swap or something like that.
>>
>> Thanks,
>>   Eric
>>
>>
>>
>> On Tue, Mar 19, 2013 at 11:23 AM, Alexis Groppi 
>> <alexis.groppi at u-bordeaux2.fr <mailto:alexis.groppi at u-bordeaux2.fr>> 
>> wrote:
>>
>>     Hi Adina,
>>
>>     First of all thanks for your answer and your advices :)
>>     The script extract-partitions.py works !
>>     For the do-partition.py on my second set, it runs since 32 hours.
>>     Should it not have produced at least one temporary .pmap file ?
>>
>>     Thanks again
>>
>>     Alexis
>>
>>     Le 19/03/2013 12:58, Adina Chuang Howe a écrit :
>>>
>>>
>>>         Message: 1
>>>         Date: Tue, 19 Mar 2013 10:41:45 +0100
>>>         From: Alexis Groppi <alexis.groppi at u-bordeaux2.fr
>>>         <mailto:alexis.groppi at u-bordeaux2.fr>>
>>>         Subject: [khmer] Duration of do-partition.py (very long !)
>>>         To: khmer at lists.idyll.org <mailto:khmer at lists.idyll.org>
>>>         Message-ID: <514832D9.7090207 at u-bordeaux2.fr
>>>         <mailto:514832D9.7090207 at u-bordeaux2.fr>>
>>>         Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"
>>>
>>>         Hi Titus,
>>>
>>>         After digital normalization and filter-below-abund, upon
>>>         your advice I
>>>         performed do.partition.py <http://do.partition.py> on 2 sets
>>>         of data (approx 2.5 millions of
>>>         reads (75 nt)) :
>>>
>>>         /khmer-BETA/scripts/do-partition.py -k 20 -x 1e9
>>>         /ag/khmer/Sample_174/174r1_prinseq_good_bFr8.fasta.keep.below.graphbase
>>>         /ag/khmer/Sample_174/174r1_prinseq_good_bFr8.fasta.keep.below
>>>         and
>>>         /khmer-BETA/scripts/do-partition.py -k 20 -x 1e9
>>>         /ag/khmer/Sample_174/174r2_prinseq_good_1lIQ.fasta.keep.below.graphbase
>>>         /ag/khmer/Sample_174/174r2_prinseq_good_1lIQ.fasta.keep.below
>>>
>>>         For the first one I got a
>>>         174r1_prinseq_good_bFr8.fasta.keep.below.graphbase.info
>>>         <http://174r1_prinseq_good_bFr8.fasta.keep.below.graphbase.info>
>>>         with the
>>>         information : 33 subsets total
>>>         Thereafter 33 files .pmap from 0.pmap to 32.pmap regurlarly
>>>         were created
>>>         and finally I got unique file
>>>         174r1_prinseq_good_bFr8.fasta.keep.below.part (all the .pmap
>>>         files were
>>>         deleted)
>>>         This treatment lasted approx 56 hours.
>>>
>>>         For the second set (174r2), do-partition.py is started since
>>>         32 hours
>>>         but I only got the
>>>         174r2_prinseq_good_1lIQ.fasta.keep.below.graphbase.info
>>>         <http://174r2_prinseq_good_1lIQ.fasta.keep.below.graphbase.info>
>>>         with the
>>>         information : 35 subsets total
>>>         And nothing more...
>>>
>>>         Is this duration "normal" ?
>>>
>>>
>>>     Yes, this is typical.  The longest I've had it run is 3 weeks
>>>     for very large (billions of reads).  In general, partitioning is
>>>     the most time consuming of all the steps.  Once its finished,
>>>     you'll have much smaller files which can be assembled very
>>>     quickly.  Since I run assembly on multiple assembler and with
>>>     multiple K lengths, this gain is often  significant for me.
>>>
>>>     To get the actual partitioned files, you can use the following
>>>     script:
>>>
>>>     https://github.com/ged-lab/khmer/blob/master/scripts/extract-partitions.py
>>>
>>>         (The parameters for the threads are by default (4 threads))
>>>         33 subsets and only one file at the end ?
>>>         Should I stop do-partition.py on the second set and re run
>>>         it with more
>>>         threads ?
>>>
>>>
>>>     I'd suggest letting it run.
>>>
>>>     Best,
>>>     Adina
>>>
>>>
>>>     _______________________________________________
>>>     khmer mailing list
>>>     khmer at lists.idyll.org  <mailto:khmer at lists.idyll.org>
>>>     http://lists.idyll.org/listinfo/khmer
>>
>>     -- 
>>
>>     _______________________________________________
>>     khmer mailing list
>>     khmer at lists.idyll.org <mailto:khmer at lists.idyll.org>
>>     http://lists.idyll.org/listinfo/khmer
>>
>>
>>
>>
>> -- 
>> Eric McDonald
>> HPC/Cloud Software Engineer
>>   for the Institute for Cyber-Enabled Research (iCER)
>>   and the Laboratory for Genomics, Evolution, and Development (GED)
>> Michigan State University
>> P: 517-355-8733
>
> -- 

-- 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20130321/c9dc1d7b/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 29033 bytes
Desc: not available
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20130321/c9dc1d7b/attachment-0006.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 29033 bytes
Desc: not available
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20130321/c9dc1d7b/attachment-0007.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Signature_Mail_A_Groppi.png
Type: image/png
Size: 29033 bytes
Desc: not available
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20130321/c9dc1d7b/attachment-0008.png>


More information about the khmer mailing list