[khmer] Lump release - find-knots step

Sun Oct 27 12:03:49 PDT 2013

On Sun, Oct 27, 2013 at 12:26:52PM +0000, Adi Faigenboim wrote:
> I have a metagenome of about 2.5G reads. I used the khmer pipeline with dignorm c=20, filtering and partitioning. After the partitioning step I received 345 groups and a very big knot (123 GB). When using the knot release pipeline a received 8352 pmaps. I'm correctly in find-knots.py using  -x 70e9 -N 4. After running a month in this step, only 1450 pmaps have been processed...is it possible that this stage would take so long?
> Can I split this stage to different computers (run the loop over the pmap_files parallel) ?
> Can you please shed some light as to what could be the cause for this and should I maybe do the partitioning in a different way ?
> I tried lowering the coverage to c=10 in the dignorm step but got 20% less data which I think is rather a lot.

Hi Adi,

we've got a faster approach in the works -- see 'filter-below-abund',
as used in the partitioning section of this protocol:

https://khmer-protocols.readthedocs.org/en/latest/metagenomics/index.html

And yes, the problem is that lump removal in the find-knots script is
dependent on exhaustively traversing all of the repetitive sequence.
It works really poorly on high-coverage data sets :(.

best,
--titus
-- 
C. Titus Brown, ctb at msu.edu