[khmer] diginorm on merged reads

John Stanton-Geddes johnsg at uvm.edu
Fri Jul 26 13:10:36 PDT 2013


Hi Titus and the khmer list,
I'm working on transcriptome assembly with samples treated at 12 different
temperatures to capture genes expressed across the thermal range of my
favorite ant species. I pooled the samples and ran them in a single lane of
100 bp paired end HiSeq, so I have about 16 million reads per sample, 160
million reads total.

My question:
is there any benefit to merging my paired-end reads (e.g. using FLASH
http://bioinformatics.oxfordjournals.org/content/early/2011/09/07/bioinformatics.btr507)
prior to running diginorm? A preliminary run of FLASH on some of my samples
showed that about 65% of reads are merged (which is a bit surprising since
the library was supposed to have been size-selected at 200 bp).

My thought is to run diginorm on the merged reads, and also on the
un-merged reads using the `-p` option as documented previously (
http://lists.idyll.org/pipermail/khmer/2013-July/000123.html). I'd then
combine all these and run a second pass of diginorm.

Is this a valid approach, or is merging reads redundant with what diginorm
does (since reads that add extra coverage would be tossed out anyway)?

Apologies if this is a noob question.

Thanks for the software!

-John

-- 
Postdoctoral Research Associate
Department of Biology, University of Vermont
Room 211, Marsh Life Science Building
109 Carrigan Drive
Burlington, Vermont 05405
www.johnstantongeddes.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20130726/54bb553f/attachment-0002.htm>


More information about the khmer mailing list