[khmer] Diginorm and error correction

Daniel Standage daniel.standage at gmail.com
Fri Dec 5 08:49:27 PST 2014


Greetings!

I have a quick question. I understand the primary motivation behind digital
normalization, the idea of discarding data without losing any information.
My question is about the claim that diginorm retains all real kmers while
discarding erroneous ones. After reading over the arXiv preprint again, it
seems this claim is independent of the three-pass protocol which does
additional error correction.

If we assume that errors are present in low abundance, why would diginorm
ever discard a read containing an error? Wouldn't the same error have to be
present a certain number of times before the associated kmers had
sufficient coverage to discard those reads? In that case, we're much less
confident that it's not real variation. Or are there probabilistic data
structures involved that discard likely errors?

Thanks!
Daniel

--
Daniel Standage
Ph.D. Candidate
Computational Genome Science Lab
Indiana University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20141205/6c36ef2f/attachment.html>


More information about the khmer mailing list