[khmer] How to speed up the filter-below-abund script ?

Eric McDonald emcd.msu at gmail.com
Tue Mar 12 16:55:21 PDT 2013


Hi Alexis,

One way to get the 'bleeding-edge' branch is to clone it into a fresh
directory; for example:
   git clone http://github.com/ged-lab/khmer.git -b bleeding-edge khmer-BETA

Assuming you already have a clone of the 'ged-lab/khmer' repo, then you
should also be able to do:
  git fetch origin
  git checkout bleeding-edge
Depending on how old your Git client is and what its defaults are, you may
have to do the following instead:
  git checkout --track -b bleeding-edge origin/bleeding-edge

Hope this helps,
  Eric


On Tue, Mar 12, 2013 at 11:32 AM, Alexis Groppi <
alexis.groppi at u-bordeaux2.fr> wrote:

>
> Le 12/03/2013 16:16, C. Titus Brown a écrit :
>
> On Tue, Mar 12, 2013 at 04:15:05PM +0100, Alexis Groppi wrote:
>
>  Hi Titus,
>
> Thanks for your answer
> Actually it's my second attempt with filter-below-abund.
> The first time, I thought the problem was coming from the location of my  table.kh file : in a storage element with poor level performance of I/O
> I killed the job after 24h, moved the file in a best place and re run it
> But with the same result : no completion after 24h
>
> Any Idea ?
>
> Thanks
>
> Cheers From Bordeaux :)
>
> Alexis
>
> PS : The command line was the following :
>
> ./filter-below-abund.py 174r1_table.kh 174r1_prinseq_good_bFr8.fasta.keep
>
> Is this correct ?
>
>  Yes, looks right... Can you try with the bleeding-edge branch, which now
> incorporates a potential fix for this issue?
>
>  From here : https://github.com/ged-lab/khmer/tree/bleeding-edge ?
> or
> here : https://github.com/ctb/khmer/tree/bleeding-edge ?
>
> Do I have to make a fresh install ? and How  ?
> Or just replace all the files and folders ?
>
> Thanks :)
>
> Alexis
>
>
>
> thanks,
> --titus
>
>
>  Le 12/03/2013 14:41, C. Titus Brown a ?crit :
>
>  On Tue, Mar 12, 2013 at 10:48:03AM +0100, Alexis Groppi wrote:
>
>  Metagenome assembly :
> My data :
> - original (quality filtered) data : 4463243 reads (75 nt) (Illumina)
> 1/ Single pass digital normalization with normalize-by-median (C=20)
> ==> file .keep of 2560557 reads
> 2/ generated a hash table by load-into-counting on the .keep file
> ==> file .kh of ~16Go (huge file ?!)
> 3/ filter-below-abund with C=100 from the two previous file (table.kh
> and reads.keep)
> Still running after 24 hours  :(
>
> Any advice to speed up this step ? ... and the others (partitionning ...) ?
>
> I can have an access to a HPC : ~3000 cores.
>
>  Hi Alexis,
>
> filter-below-abund and filter-abund have occasional bugs that prevent them
> from completing.  I would kill and restart.  For that few reads it should
> take no more than a few hours to do everything.
>
> Most of what khmer does cannot easily be distributed across multiple chassis,
> note.
>
> best,
> --titus
>
>  --
>
>
> --
>
> _______________________________________________
> khmer mailing list
> khmer at lists.idyll.org
> http://lists.idyll.org/listinfo/khmer
>
>


-- 
Eric McDonald
HPC/Cloud Software Engineer
  for the Institute for Cyber-Enabled Research (iCER)
  and the Laboratory for Genomics, Evolution, and Development (GED)
Michigan State University
P: 517-355-8733
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20130312/69030c72/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 29033 bytes
Desc: not available
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20130312/69030c72/attachment-0002.png>


More information about the khmer mailing list