[khmer] Duration of do-partition.py (very long !) (Alexis Groppi)
Alexis Groppi
alexis.groppi at u-bordeaux2.fr
Thu Mar 21 02:13:30 PDT 2013
Hi Eric,
The script do-partition.py is now running since 22 hours.
Only the file.info has been generated. No .pmap file were created.
qstat -f gives :
resources_used.cput = 441:04:21
resources_used.mem = 12764228kb
resources_used.vmem = 13926732kb
resources_used.walltime = 22:05:56
The amount of RAM on the server is 256 Go and the swap space is also 256 Go
Your opinion ?
Thanks
Alexis
Le 20/03/2013 16:43, Alexis Groppi a écrit :
> Hi Eric,
>
> Actually the previous job was terminated by the limit of the walltime.
> I relaunched the script.
> qstat -fr gives :
> resources_used.cput = 93:23:08
> resources_used.mem = 12341932kb
> resources_used.vmem = 13271372kb
> resources_used.walltime = 04:42:39
>
> At this moment only the file.info has been generated.
>
> Let's wait and see ...
>
> Thanks again
>
> Alexis
>
>
> Le 19/03/2013 21:50, Eric McDonald a écrit :
>> Hi Alexis,
>>
>> What does:
>> qstat -f <job-id>
>> where <job-id> is the ID of your job tell you for the following fields:
>> resources_used.cput
>> resources_used.vmem
>>
>> And how do those values compare to actual amount of elapsed time for
>> the job, the amount of physical memory on the node, and the total
>> memory (RAM + swap space) on the node?
>> Just checking to make sure that everything is running as it should be
>> and that your process is not heavily into swap or something like that.
>>
>> Thanks,
>> Eric
>>
>>
>>
>> On Tue, Mar 19, 2013 at 11:23 AM, Alexis Groppi
>> <alexis.groppi at u-bordeaux2.fr <mailto:alexis.groppi at u-bordeaux2.fr>>
>> wrote:
>>
>> Hi Adina,
>>
>> First of all thanks for your answer and your advices :)
>> The script extract-partitions.py works !
>> For the do-partition.py on my second set, it runs since 32 hours.
>> Should it not have produced at least one temporary .pmap file ?
>>
>> Thanks again
>>
>> Alexis
>>
>> Le 19/03/2013 12:58, Adina Chuang Howe a écrit :
>>>
>>>
>>> Message: 1
>>> Date: Tue, 19 Mar 2013 10:41:45 +0100
>>> From: Alexis Groppi <alexis.groppi at u-bordeaux2.fr
>>> <mailto:alexis.groppi at u-bordeaux2.fr>>
>>> Subject: [khmer] Duration of do-partition.py (very long !)
>>> To: khmer at lists.idyll.org <mailto:khmer at lists.idyll.org>
>>> Message-ID: <514832D9.7090207 at u-bordeaux2.fr
>>> <mailto:514832D9.7090207 at u-bordeaux2.fr>>
>>> Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"
>>>
>>> Hi Titus,
>>>
>>> After digital normalization and filter-below-abund, upon
>>> your advice I
>>> performed do.partition.py <http://do.partition.py> on 2 sets
>>> of data (approx 2.5 millions of
>>> reads (75 nt)) :
>>>
>>> /khmer-BETA/scripts/do-partition.py -k 20 -x 1e9
>>> /ag/khmer/Sample_174/174r1_prinseq_good_bFr8.fasta.keep.below.graphbase
>>> /ag/khmer/Sample_174/174r1_prinseq_good_bFr8.fasta.keep.below
>>> and
>>> /khmer-BETA/scripts/do-partition.py -k 20 -x 1e9
>>> /ag/khmer/Sample_174/174r2_prinseq_good_1lIQ.fasta.keep.below.graphbase
>>> /ag/khmer/Sample_174/174r2_prinseq_good_1lIQ.fasta.keep.below
>>>
>>> For the first one I got a
>>> 174r1_prinseq_good_bFr8.fasta.keep.below.graphbase.info
>>> <http://174r1_prinseq_good_bFr8.fasta.keep.below.graphbase.info>
>>> with the
>>> information : 33 subsets total
>>> Thereafter 33 files .pmap from 0.pmap to 32.pmap regurlarly
>>> were created
>>> and finally I got unique file
>>> 174r1_prinseq_good_bFr8.fasta.keep.below.part (all the .pmap
>>> files were
>>> deleted)
>>> This treatment lasted approx 56 hours.
>>>
>>> For the second set (174r2), do-partition.py is started since
>>> 32 hours
>>> but I only got the
>>> 174r2_prinseq_good_1lIQ.fasta.keep.below.graphbase.info
>>> <http://174r2_prinseq_good_1lIQ.fasta.keep.below.graphbase.info>
>>> with the
>>> information : 35 subsets total
>>> And nothing more...
>>>
>>> Is this duration "normal" ?
>>>
>>>
>>> Yes, this is typical. The longest I've had it run is 3 weeks
>>> for very large (billions of reads). In general, partitioning is
>>> the most time consuming of all the steps. Once its finished,
>>> you'll have much smaller files which can be assembled very
>>> quickly. Since I run assembly on multiple assembler and with
>>> multiple K lengths, this gain is often significant for me.
>>>
>>> To get the actual partitioned files, you can use the following
>>> script:
>>>
>>> https://github.com/ged-lab/khmer/blob/master/scripts/extract-partitions.py
>>>
>>> (The parameters for the threads are by default (4 threads))
>>> 33 subsets and only one file at the end ?
>>> Should I stop do-partition.py on the second set and re run
>>> it with more
>>> threads ?
>>>
>>>
>>> I'd suggest letting it run.
>>>
>>> Best,
>>> Adina
>>>
>>>
>>> _______________________________________________
>>> khmer mailing list
>>> khmer at lists.idyll.org <mailto:khmer at lists.idyll.org>
>>> http://lists.idyll.org/listinfo/khmer
>>
>> --
>>
>> _______________________________________________
>> khmer mailing list
>> khmer at lists.idyll.org <mailto:khmer at lists.idyll.org>
>> http://lists.idyll.org/listinfo/khmer
>>
>>
>>
>>
>> --
>> Eric McDonald
>> HPC/Cloud Software Engineer
>> for the Institute for Cyber-Enabled Research (iCER)
>> and the Laboratory for Genomics, Evolution, and Development (GED)
>> Michigan State University
>> P: 517-355-8733
>
> --
--
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20130321/c9dc1d7b/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 29033 bytes
Desc: not available
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20130321/c9dc1d7b/attachment-0006.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 29033 bytes
Desc: not available
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20130321/c9dc1d7b/attachment-0007.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Signature_Mail_A_Groppi.png
Type: image/png
Size: 29033 bytes
Desc: not available
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20130321/c9dc1d7b/attachment-0008.png>
More information about the khmer
mailing list