[khmer] Duration of do-partition.py (very long !) (Alexis Groppi)

Eric McDonald emcd.msu at gmail.com
Tue Mar 19 13:50:11 PDT 2013


Hi Alexis,

What does:
  qstat -f <job-id>
where <job-id> is the ID of your job tell you for the following fields:
  resources_used.cput
  resources_used.vmem

And how do those values compare to actual amount of elapsed time for the
job, the amount of physical memory on the node, and the total memory (RAM +
swap space) on the node?
Just checking to make sure that everything is running as it should be and
that your process is not heavily into swap or something like that.

Thanks,
  Eric



On Tue, Mar 19, 2013 at 11:23 AM, Alexis Groppi <
alexis.groppi at u-bordeaux2.fr> wrote:

>  Hi Adina,
>
> First of all thanks for your answer and your advices :)
> The script extract-partitions.py works !
> For the do-partition.py on my second set, it runs since 32 hours. Should
> it not have produced at least one temporary .pmap file ?
>
> Thanks again
>
> Alexis
>
> Le 19/03/2013 12:58, Adina Chuang Howe a écrit :
>
>
>
>  Message: 1
>> Date: Tue, 19 Mar 2013 10:41:45 +0100
>> From: Alexis Groppi <alexis.groppi at u-bordeaux2.fr>
>> Subject: [khmer] Duration of do-partition.py (very long !)
>> To: khmer at lists.idyll.org
>> Message-ID: <514832D9.7090207 at u-bordeaux2.fr>
>> Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"
>>
>> Hi Titus,
>>
>> After digital normalization and filter-below-abund, upon your advice I
>> performed do.partition.py on 2 sets of data (approx 2.5 millions of
>> reads (75 nt)) :
>>
>> /khmer-BETA/scripts/do-partition.py -k 20 -x 1e9
>> /ag/khmer/Sample_174/174r1_prinseq_good_bFr8.fasta.keep.below.graphbase
>> /ag/khmer/Sample_174/174r1_prinseq_good_bFr8.fasta.keep.below
>> and
>> /khmer-BETA/scripts/do-partition.py -k 20 -x 1e9
>> /ag/khmer/Sample_174/174r2_prinseq_good_1lIQ.fasta.keep.below.graphbase
>> /ag/khmer/Sample_174/174r2_prinseq_good_1lIQ.fasta.keep.below
>>
>> For the first one I got a
>> 174r1_prinseq_good_bFr8.fasta.keep.below.graphbase.info with the
>> information : 33 subsets total
>> Thereafter 33 files .pmap from 0.pmap to 32.pmap regurlarly were created
>> and finally I got unique file
>> 174r1_prinseq_good_bFr8.fasta.keep.below.part (all the .pmap files were
>> deleted)
>> This treatment lasted approx 56 hours.
>>
>> For the second set (174r2), do-partition.py is started since 32 hours
>> but I only got the
>> 174r2_prinseq_good_1lIQ.fasta.keep.below.graphbase.info with the
>> information : 35 subsets total
>> And nothing more...
>>
>> Is this duration "normal" ?
>>
>
>  Yes, this is typical.  The longest I've had it run is 3 weeks for very
> large (billions of reads).  In general, partitioning is the most time
> consuming of all the steps.  Once its finished, you'll have much smaller
> files which can be assembled very quickly.  Since I run assembly on
> multiple assembler and with multiple K lengths, this gain is often
>  significant for me.
>
>  To get the actual partitioned files, you can use the following script:
>
>
> https://github.com/ged-lab/khmer/blob/master/scripts/extract-partitions.py
>
>  (The parameters for the threads are by default (4 threads))
>> 33 subsets and only one file at the end ?
>> Should I stop do-partition.py on the second set and re run it with more
>> threads ?
>>
>>
>  I'd suggest letting it run.
>
>  Best,
> Adina
>
>
> _______________________________________________
> khmer mailing listkhmer at lists.idyll.orghttp://lists.idyll.org/listinfo/khmer
>
>
> --
>
> _______________________________________________
> khmer mailing list
> khmer at lists.idyll.org
> http://lists.idyll.org/listinfo/khmer
>
>


-- 
Eric McDonald
HPC/Cloud Software Engineer
  for the Institute for Cyber-Enabled Research (iCER)
  and the Laboratory for Genomics, Evolution, and Development (GED)
Michigan State University
P: 517-355-8733
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20130319/984fb6ad/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 29033 bytes
Desc: not available
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20130319/984fb6ad/attachment-0002.png>


More information about the khmer mailing list