[khmer] Duration of do-partition.py (very long !) (Alexis Groppi)

Alexis Groppi alexis.groppi at u-bordeaux2.fr
Tue Mar 19 08:23:02 PDT 2013


Hi Adina,

First of all thanks for your answer and your advices :)
The script extract-partitions.py works !
For the do-partition.py on my second set, it runs since 32 hours. Should 
it not have produced at least one temporary .pmap file ?

Thanks again

Alexis

Le 19/03/2013 12:58, Adina Chuang Howe a écrit :
>
>
>     Message: 1
>     Date: Tue, 19 Mar 2013 10:41:45 +0100
>     From: Alexis Groppi <alexis.groppi at u-bordeaux2.fr
>     <mailto:alexis.groppi at u-bordeaux2.fr>>
>     Subject: [khmer] Duration of do-partition.py (very long !)
>     To: khmer at lists.idyll.org <mailto:khmer at lists.idyll.org>
>     Message-ID: <514832D9.7090207 at u-bordeaux2.fr
>     <mailto:514832D9.7090207 at u-bordeaux2.fr>>
>     Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"
>
>     Hi Titus,
>
>     After digital normalization and filter-below-abund, upon your advice I
>     performed do.partition.py <http://do.partition.py> on 2 sets of
>     data (approx 2.5 millions of
>     reads (75 nt)) :
>
>     /khmer-BETA/scripts/do-partition.py -k 20 -x 1e9
>     /ag/khmer/Sample_174/174r1_prinseq_good_bFr8.fasta.keep.below.graphbase
>     /ag/khmer/Sample_174/174r1_prinseq_good_bFr8.fasta.keep.below
>     and
>     /khmer-BETA/scripts/do-partition.py -k 20 -x 1e9
>     /ag/khmer/Sample_174/174r2_prinseq_good_1lIQ.fasta.keep.below.graphbase
>     /ag/khmer/Sample_174/174r2_prinseq_good_1lIQ.fasta.keep.below
>
>     For the first one I got a
>     174r1_prinseq_good_bFr8.fasta.keep.below.graphbase.info
>     <http://174r1_prinseq_good_bFr8.fasta.keep.below.graphbase.info>
>     with the
>     information : 33 subsets total
>     Thereafter 33 files .pmap from 0.pmap to 32.pmap regurlarly were
>     created
>     and finally I got unique file
>     174r1_prinseq_good_bFr8.fasta.keep.below.part (all the .pmap files
>     were
>     deleted)
>     This treatment lasted approx 56 hours.
>
>     For the second set (174r2), do-partition.py is started since 32 hours
>     but I only got the
>     174r2_prinseq_good_1lIQ.fasta.keep.below.graphbase.info
>     <http://174r2_prinseq_good_1lIQ.fasta.keep.below.graphbase.info>
>     with the
>     information : 35 subsets total
>     And nothing more...
>
>     Is this duration "normal" ?
>
>
> Yes, this is typical.  The longest I've had it run is 3 weeks for very 
> large (billions of reads).  In general, partitioning is the most time 
> consuming of all the steps.  Once its finished, you'll have much 
> smaller files which can be assembled very quickly.  Since I run 
> assembly on multiple assembler and with multiple K lengths, this gain 
> is often  significant for me.
>
> To get the actual partitioned files, you can use the following script:
>
> https://github.com/ged-lab/khmer/blob/master/scripts/extract-partitions.py
>
>     (The parameters for the threads are by default (4 threads))
>     33 subsets and only one file at the end ?
>     Should I stop do-partition.py on the second set and re run it with
>     more
>     threads ?
>
>
> I'd suggest letting it run.
>
> Best,
> Adina
>
>
> _______________________________________________
> khmer mailing list
> khmer at lists.idyll.org
> http://lists.idyll.org/listinfo/khmer

-- 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20130319/fad8a4e0/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Signature_Mail_A_Groppi.png
Type: image/png
Size: 29033 bytes
Desc: not available
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20130319/fad8a4e0/attachment-0002.png>


More information about the khmer mailing list