[khmer] Using khmer on paired-end Illumina data set

Tue Nov 5 03:47:19 PST 2013

Hi Mamoon,

Is this mRNAseq or genomic?

You should be able to eliminate many duplicates by randomly shuffling
the dataset and running diginorm.  However, at the moment, khmer does
not have any specific duplicate removal code.

best,
--titus

On Tue, Nov 05, 2013 at 01:53:52PM +0300, Mamoon Rashid wrote:
> Dear khmer users,
> Please let me know the easy way to reduce my dataset (13.9 m PE reads)
> which shows a lot of duplicates in the FastQC plots (attached). This
> resulted into very fragmented assembly using velvet.
> Any suggestions are most welcome.
> Best regards
> Mamoon
> 
> 
> -- 
> *Mamoon Rashid, PhD*
> Post-Doctoral Fellow
> Marine Microbial Ecology Lab
> Red Sea Research Center
> 4700 King Abdullah University of Science and Technology (KAUST)
> Room 2217-WS03, Ibn Al-Haytham Building (2)
> 4700 *King* *Abdullah* *University* *of* *Science* *and* *Technology*
> Thuwal 23955-6900, Kingdom of Saudi Arabia
> Office: +966 (0) 2 808-2671

> _______________________________________________
> khmer mailing list
> khmer at lists.idyll.org
> http://lists.idyll.org/listinfo/khmer

-- 
C. Titus Brown, ctb at msu.edu