[khmer] digital normalization clarification
Susan Miller
sjmiller at email.arizona.edu
Mon May 20 14:23:35 PDT 2013
I have an Illumina HiSeq2000 data set with ~140M paired end reads from a
bacterial genome with some insect host DNA. I would like to use digital
normalization to reduce this data set in preparation for de novo
assembly. I see 2 slightly different suggestions, one in the
khmer.readthedocs page:
https://khmer.readthedocs.org/en/latest/guide.html#genome-assembly-including-mda-samples-and-highly-polymorphic-genomes
and the other in the angus/diginorm-2012 tutorial:
http://ged.msu.edu/angus/diginorm-2012/tutorial.html
The khmer.readthedocs page (under
genome-assembly-including-mda-samples-and-highly-polymorphic-genomes)
suggests running normalize-by-median, and filter-abund, followed by
strip-and-split-for-assembly and another normalize-by-median.
The angus diginorm tutorial differs in the three-pass instructions, as
it shows the 2nd normalize-by-median being done before
strip-and-split-for-assembly.
Does in make a difference whether strip-and-split-for-assembly is run
before or after the 2nd normalize-by-median step?
Not having /1 and /2 in the read names didn't seem to be a problem for
normalize-by-median, but strip-and-split-for-assembly is unable to
detect that my reads are paired. If I need to go back to the original
reads and add the paired end /1 and /2 suffixes, it would be nice if the
"Preparing your sequences" section of khmer.readthedocs specified that.
Thanks,
Susan Miller
Arizona Research Labs BioComputing
More information about the khmer
mailing list