[khmer] Diginorm and error correction

C. Titus Brown ctb at msu.edu
Fri Dec 5 08:53:31 PST 2014


On Fri, Dec 05, 2014 at 04:49:27PM +0000, Daniel Standage wrote:
> Greetings!
> 
> I have a quick question. I understand the primary motivation behind digital
> normalization, the idea of discarding data without losing any information.
> My question is about the claim that diginorm retains all real kmers while
> discarding erroneous ones. After reading over the arXiv preprint again, it
> seems this claim is independent of the three-pass protocol which does
> additional error correction.
> 
> If we assume that errors are present in low abundance, why would diginorm
> ever discard a read containing an error? Wouldn't the same error have to be
> present a certain number of times before the associated kmers had
> sufficient coverage to discard those reads? In that case, we're much less
> confident that it's not real variation. Or are there probabilistic data
> structures involved that discard likely errors?
> 
> Thanks!
> Daniel

Hey Daniel,

More/better answer later, but look at the part of the paper where we talk
about losing tips of contigs in the mRNAseq simulation.  The median k-mer count
cannot tell the difference between undersampled contig edges and errors (which
may occur in real data sets).

But good question :)

cheers,
--titus
-- 
C. Titus Brown, ctb at msu.edu



More information about the khmer mailing list