[khmer] Use diginorm to delete identical reads

C. Titus Brown ctb at msu.edu
Mon Aug 5 06:02:25 PDT 2013


Ahh, sorry, missed the Subject in my last e-mail!

Artificially duplicated reads tend to not be exact but rather have the same
start position as well as considerable sequence identity.  Since diginorm
doesn't use any alignment techniques, you would only be eliminating sequences
with exact 100 bp matches -- a significant minority of the ADRs.  I've been
thinking about how to modify khmer to let you cleanse ADRs but don't have
anything useful to say about it yet...

--titus

On Mon, Aug 05, 2013 at 02:38:35PM +0800, cy_jiang wrote:
> Hi all,
> 
> 
> I am wondering if I can use diginorm to remove identical reads from my dataset. 
> 
> 
> Precisely, I got paired-end reads of length 100bp. I first interleaved the reads into one file, then ran diginorm with -k 100 -C 1 -x 7.2e9 -N 4. Then the following information prompted up:
> python: ktable.cc:21: khmer::HashIntoType khmer::_hash(const char*, khmer::WordLength, khmer::HashIntoType&, khmer::HashIntoType&): Assertion `k <= sizeof(HashIntoType)*4' failed.
> What did this exactly mean? Is there anything I can do to achieve the goal?
> 
> 
> Thanks in advance!
> 
> 
> Daniel
> _______________________________________________
> khmer mailing list
> khmer at lists.idyll.org
> http://lists.idyll.org/listinfo/khmer


-- 
C. Titus Brown, ctb at msu.edu




More information about the khmer mailing list