[khmer] suggestions for digital normalization parameters?
Susan Miller
sjmiller at email.arizona.edu
Tue May 28 17:18:32 PDT 2013
I've run khmer digital normalization steps on very high coverage
Illumina HiSeq2000 data (140M paired end reads for a 1.2Mb bacterial
genome plus host insect DNA) in preparation for de novo assembly. I
used the pipeline suggested here:
https://khmer.readthedocs.org/en/latest/guide.html#genome-assembly-including-mda-samples-and-highly-polymorphic-genomes
normalize-by-median C=20 (~25% of reads eliminated)
strip-and-split-for-assembly (~50% of remaining reads eliminated)
filter-abund C=1 (not much reduction)
strip-and-split-for-assembly
normalize-by-median C=5 (~12% of remaining reads eliminated)
Running the Ray assembler with kmer sweep 21..49 results in quick
assemblies (~30 min on 144 processors) but max contig length is only
1081 bases. With an earlier run on a similar sample, with lower
coverage coming from Illumina I was able to get an assembled contig of
length 52926 without digital normalization.
Would you recommend different diginorm parameters to try to get longer
contigs? Or should I try partitioning instead?
Thanks for any ideas,
Susan Miller
ARL BioComputing
More information about the khmer
mailing list