[khmer] ??RE: Lump release - find-knots step

Wed Oct 30 17:48:22 PDT 2013

On Tue, Oct 29, 2013 at 05:16:23PM +0000, Adi Faigenboim wrote:
> Hi Titus,
> Thank you for your response. From what I understand I can use a different coverage than 20 that will create a smaller lump for example c=5. Is there anything I can do with this current lump to speed up the process?(number of pmaps  8352)
> I looked at the find-knots.py', wondering if I can run pmaps 1 till 4000 on one computer and the rest on another computer?

Hi Adi,

Going to a lower coverage in one pass will fragment your assembly;
see the diginorm paper for a discussion of this.  You would want
to use the three-pass diginorm in the kalamazoo protocol, below.
I'm not sure what effect it will have on the lump tho.

You can definitely run pmaps on different computers.  However,  I would
suggest switching to the filter-below-abund approach first...

cheers,
--titus

> _____________________________________
> ?: C. Titus Brown [ctb at msu.edu]
> ??????: ??? ????? 27 ??????? 2013 21:03
> ????: Adi Faigenboim
> Cc: khmer at lists.idyll.org
> ??????: Re: [khmer] Lump release - find-knots step
> 
> On Sun, Oct 27, 2013 at 12:26:52PM +0000, Adi Faigenboim wrote:
> > I have a metagenome of about 2.5G reads. I used the khmer pipeline with dignorm c=20, filtering and partitioning. After the partitioning step I received 345 groups and a very big knot (123 GB). When using the knot release pipeline a received 8352 pmaps. I'm correctly in find-knots.py using  -x 70e9 -N 4. After running a month in this step, only 1450 pmaps have been processed...is it possible that this stage would take so long?
> > Can I split this stage to different computers (run the loop over the pmap_files parallel) ?
> > Can you please shed some light as to what could be the cause for this and should I maybe do the partitioning in a different way ?
> > I tried lowering the coverage to c=10 in the dignorm step but got 20% less data which I think is rather a lot.
> 
> Hi Adi,
> 
> we've got a faster approach in the works -- see 'filter-below-abund',
> as used in the partitioning section of this protocol:
> 
> https://khmer-protocols.readthedocs.org/en/latest/metagenomics/index.html
> 
> And yes, the problem is that lump removal in the find-knots script is
> dependent on exhaustively traversing all of the repetitive sequence.
> It works really poorly on high-coverage data sets :(.
> 
> best,
> --titus
> --
> C. Titus Brown, ctb at msu.edu
> 
> This mail was received via Mail-SeCure System.
> 
> 
> 
> This mail was sent via Mail-SeCure System.
> 
> 

-- 
C. Titus Brown, ctb at msu.edu