[khmer] ??RE: Lump release - find-knots step
C. Titus Brown
ctb at msu.edu
Wed Oct 30 17:48:22 PDT 2013
On Tue, Oct 29, 2013 at 05:16:23PM +0000, Adi Faigenboim wrote:
> Hi Titus,
> Thank you for your response. From what I understand I can use a different coverage than 20 that will create a smaller lump for example c=5. Is there anything I can do with this current lump to speed up the process?(number of pmaps 8352)
> I looked at the find-knots.py', wondering if I can run pmaps 1 till 4000 on one computer and the rest on another computer?
Going to a lower coverage in one pass will fragment your assembly;
see the diginorm paper for a discussion of this. You would want
to use the three-pass diginorm in the kalamazoo protocol, below.
I'm not sure what effect it will have on the lump tho.
You can definitely run pmaps on different computers. However, I would
suggest switching to the filter-below-abund approach first...
> ?: C. Titus Brown [ctb at msu.edu]
> ??????: ??? ????? 27 ??????? 2013 21:03
> ????: Adi Faigenboim
> Cc: khmer at lists.idyll.org
> ??????: Re: [khmer] Lump release - find-knots step
> On Sun, Oct 27, 2013 at 12:26:52PM +0000, Adi Faigenboim wrote:
> > I have a metagenome of about 2.5G reads. I used the khmer pipeline with dignorm c=20, filtering and partitioning. After the partitioning step I received 345 groups and a very big knot (123 GB). When using the knot release pipeline a received 8352 pmaps. I'm correctly in find-knots.py using -x 70e9 -N 4. After running a month in this step, only 1450 pmaps have been processed...is it possible that this stage would take so long?
> > Can I split this stage to different computers (run the loop over the pmap_files parallel) ?
> > Can you please shed some light as to what could be the cause for this and should I maybe do the partitioning in a different way ?
> > I tried lowering the coverage to c=10 in the dignorm step but got 20% less data which I think is rather a lot.
> Hi Adi,
> we've got a faster approach in the works -- see 'filter-below-abund',
> as used in the partitioning section of this protocol:
> And yes, the problem is that lump removal in the find-knots script is
> dependent on exhaustively traversing all of the repetitive sequence.
> It works really poorly on high-coverage data sets :(.
> C. Titus Brown, ctb at msu.edu
> This mail was received via Mail-SeCure System.
> This mail was sent via Mail-SeCure System.
C. Titus Brown, ctb at msu.edu
More information about the khmer