[khmer] filter-below-abundance typical discard rate

Chuck chuck.peperanney at gmail.com
Tue Jun 10 11:19:42 PDT 2014


Hmmm, the false positive rate was 0.015. Here are the load-into-counting
parameters:

PARAMETERS:
 - kmer size =    20            (-k)
 - n tables =     4             (-N)
 - min tablesize = 3.7e+10      (-x)

Any ideas for diagnosing if normalize-by-median is keeping many highly
erroneous reads? Would that be apparent from the kmer histogram?

The final discard rate for filter-below-abundance with a cutoff of 225 was
16% (reads normalized to C=20). Does this seem high given your experience?

-Chuck


On Tue, Jun 10, 2014 at 11:26 AM, C. Titus Brown <ctb at msu.edu> wrote:

> On Mon, Jun 09, 2014 at 08:33:45PM -0400, Chuck wrote:
> > I'm curious about typical values that people are seeing with
> > filter-below-abundance. With the default cutoff (50) I was discarding
> ~50%
> > of bp (after normalizing with C=20). If I increase the cutoff to 225 the
> > discard rate drops to 25%. I thought I was rigorously adapter trimming my
> > reads (I generally use scythe with default parameters and I monitor the
> > output fairly closely). Is this way outside the developers' experience?
> >
> > Also, at a cutoff of 235, I discard 0%. Not sure how to interpret this. I
> > realize that you don't count kmers above 255 by default with
> > load-into-counting. It seems that I don't have any kmers at the ends of
> > reads at a depth >=235 but I trim much more data with what seems like a
> > small change in the cutoff value from 235 to 225. Also, 235 < 255 :) .
>
> That's tremendously weird.
>
> I have no other useful comment :)
>
> I can come up with some wild hypotheses about what might be going on,
> but have never seen this before.
>
> If, for example, your data was high coverage but each read had a lot of
> errors,
> then normalize-by-median might be keeping a lot of the highly erroneous
> reads while filter-below-abund trimmed of the legitimate sequence.
>
> I have no idea how to interpret the 225-to-235 numbers!  Fascinating.
>
> Hmm, what table size are you using and what false positive rate is being
> reported?
>
> cheers,
> --titus
> --
> C. Titus Brown, ctb at msu.edu
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20140610/6965ab5e/attachment.htm>


More information about the khmer mailing list