[khmer] filter-below-abundance typical discard rate

C. Titus Brown ctb at msu.edu
Tue Jun 10 08:26:22 PDT 2014


On Mon, Jun 09, 2014 at 08:33:45PM -0400, Chuck wrote:
> I'm curious about typical values that people are seeing with
> filter-below-abundance. With the default cutoff (50) I was discarding ~50%
> of bp (after normalizing with C=20). If I increase the cutoff to 225 the
> discard rate drops to 25%. I thought I was rigorously adapter trimming my
> reads (I generally use scythe with default parameters and I monitor the
> output fairly closely). Is this way outside the developers' experience?
> 
> Also, at a cutoff of 235, I discard 0%. Not sure how to interpret this. I
> realize that you don't count kmers above 255 by default with
> load-into-counting. It seems that I don't have any kmers at the ends of
> reads at a depth >=235 but I trim much more data with what seems like a
> small change in the cutoff value from 235 to 225. Also, 235 < 255 :) .

That's tremendously weird.

I have no other useful comment :)

I can come up with some wild hypotheses about what might be going on,
but have never seen this before.

If, for example, your data was high coverage but each read had a lot of errors,
then normalize-by-median might be keeping a lot of the highly erroneous
reads while filter-below-abund trimmed of the legitimate sequence.

I have no idea how to interpret the 225-to-235 numbers!  Fascinating.

Hmm, what table size are you using and what false positive rate is being
reported?

cheers,
--titus
-- 
C. Titus Brown, ctb at msu.edu



More information about the khmer mailing list