[khmer] Duration of do-partition.py (very long !) (Alexis Groppi)
Alexis Groppi
alexis.groppi at u-bordeaux2.fr
Tue Mar 19 08:23:02 PDT 2013
Hi Adina,
First of all thanks for your answer and your advices :)
The script extract-partitions.py works !
For the do-partition.py on my second set, it runs since 32 hours. Should
it not have produced at least one temporary .pmap file ?
Thanks again
Alexis
Le 19/03/2013 12:58, Adina Chuang Howe a écrit :
>
>
> Message: 1
> Date: Tue, 19 Mar 2013 10:41:45 +0100
> From: Alexis Groppi <alexis.groppi at u-bordeaux2.fr
> <mailto:alexis.groppi at u-bordeaux2.fr>>
> Subject: [khmer] Duration of do-partition.py (very long !)
> To: khmer at lists.idyll.org <mailto:khmer at lists.idyll.org>
> Message-ID: <514832D9.7090207 at u-bordeaux2.fr
> <mailto:514832D9.7090207 at u-bordeaux2.fr>>
> Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"
>
> Hi Titus,
>
> After digital normalization and filter-below-abund, upon your advice I
> performed do.partition.py <http://do.partition.py> on 2 sets of
> data (approx 2.5 millions of
> reads (75 nt)) :
>
> /khmer-BETA/scripts/do-partition.py -k 20 -x 1e9
> /ag/khmer/Sample_174/174r1_prinseq_good_bFr8.fasta.keep.below.graphbase
> /ag/khmer/Sample_174/174r1_prinseq_good_bFr8.fasta.keep.below
> and
> /khmer-BETA/scripts/do-partition.py -k 20 -x 1e9
> /ag/khmer/Sample_174/174r2_prinseq_good_1lIQ.fasta.keep.below.graphbase
> /ag/khmer/Sample_174/174r2_prinseq_good_1lIQ.fasta.keep.below
>
> For the first one I got a
> 174r1_prinseq_good_bFr8.fasta.keep.below.graphbase.info
> <http://174r1_prinseq_good_bFr8.fasta.keep.below.graphbase.info>
> with the
> information : 33 subsets total
> Thereafter 33 files .pmap from 0.pmap to 32.pmap regurlarly were
> created
> and finally I got unique file
> 174r1_prinseq_good_bFr8.fasta.keep.below.part (all the .pmap files
> were
> deleted)
> This treatment lasted approx 56 hours.
>
> For the second set (174r2), do-partition.py is started since 32 hours
> but I only got the
> 174r2_prinseq_good_1lIQ.fasta.keep.below.graphbase.info
> <http://174r2_prinseq_good_1lIQ.fasta.keep.below.graphbase.info>
> with the
> information : 35 subsets total
> And nothing more...
>
> Is this duration "normal" ?
>
>
> Yes, this is typical. The longest I've had it run is 3 weeks for very
> large (billions of reads). In general, partitioning is the most time
> consuming of all the steps. Once its finished, you'll have much
> smaller files which can be assembled very quickly. Since I run
> assembly on multiple assembler and with multiple K lengths, this gain
> is often significant for me.
>
> To get the actual partitioned files, you can use the following script:
>
> https://github.com/ged-lab/khmer/blob/master/scripts/extract-partitions.py
>
> (The parameters for the threads are by default (4 threads))
> 33 subsets and only one file at the end ?
> Should I stop do-partition.py on the second set and re run it with
> more
> threads ?
>
>
> I'd suggest letting it run.
>
> Best,
> Adina
>
>
> _______________________________________________
> khmer mailing list
> khmer at lists.idyll.org
> http://lists.idyll.org/listinfo/khmer
--
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20130319/fad8a4e0/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Signature_Mail_A_Groppi.png
Type: image/png
Size: 29033 bytes
Desc: not available
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20130319/fad8a4e0/attachment-0002.png>
More information about the khmer
mailing list