[khmer] suggestions for digital normalization parameters?
sjmiller at email.arizona.edu
Tue May 28 17:18:32 PDT 2013
I've run khmer digital normalization steps on very high coverage
Illumina HiSeq2000 data (140M paired end reads for a 1.2Mb bacterial
genome plus host insect DNA) in preparation for de novo assembly. I
used the pipeline suggested here:
normalize-by-median C=20 (~25% of reads eliminated)
strip-and-split-for-assembly (~50% of remaining reads eliminated)
filter-abund C=1 (not much reduction)
normalize-by-median C=5 (~12% of remaining reads eliminated)
Running the Ray assembler with kmer sweep 21..49 results in quick
assemblies (~30 min on 144 processors) but max contig length is only
1081 bases. With an earlier run on a similar sample, with lower
coverage coming from Illumina I was able to get an assembled contig of
length 52926 without digital normalization.
Would you recommend different diginorm parameters to try to get longer
contigs? Or should I try partitioning instead?
Thanks for any ideas,
More information about the khmer