[khmer] Diginorm and error correction
C. Titus Brown
ctb at msu.edu
Fri Dec 5 08:53:31 PST 2014
On Fri, Dec 05, 2014 at 04:49:27PM +0000, Daniel Standage wrote:
> Greetings!
>
> I have a quick question. I understand the primary motivation behind digital
> normalization, the idea of discarding data without losing any information.
> My question is about the claim that diginorm retains all real kmers while
> discarding erroneous ones. After reading over the arXiv preprint again, it
> seems this claim is independent of the three-pass protocol which does
> additional error correction.
>
> If we assume that errors are present in low abundance, why would diginorm
> ever discard a read containing an error? Wouldn't the same error have to be
> present a certain number of times before the associated kmers had
> sufficient coverage to discard those reads? In that case, we're much less
> confident that it's not real variation. Or are there probabilistic data
> structures involved that discard likely errors?
>
> Thanks!
> Daniel
Hey Daniel,
More/better answer later, but look at the part of the paper where we talk
about losing tips of contigs in the mRNAseq simulation. The median k-mer count
cannot tell the difference between undersampled contig edges and errors (which
may occur in real data sets).
But good question :)
cheers,
--titus
--
C. Titus Brown, ctb at msu.edu
More information about the khmer
mailing list