[khmer] diginorm on merged reads
C. Titus Brown
ctb at msu.edu
Mon Oct 7 08:11:39 PDT 2013
Excellent, thank you for the followup! Please let me know when you
publish (or open your data) as it'd be nice to put together a protocol
for this kind of thing.
cheers,
--titus
On Mon, Oct 07, 2013 at 11:05:53AM -0400, John Stanton-Geddes wrote:
> Hi Titus and all,
> Following up on my previous question - I ran a few different assemblies
> exploring the effect of using khmer digital normalization and FLASH to
> merge short reads. I compared the results of (1) running diginorm only, (2)
> running diginorm than attempting to merge still-paired reads with FLASH,
> and (3) first attempting to merge paired reads with FLASH followed by
> diginorm. In all cases, I used trimmed-and-filtered reads and performed
> assembly using velvet-oases with a kmer of 21. Below are some assembly
> statistics.
>
> 1) diginorm only
>
> assembly stat result
> --------------------- ------------
> Total Contigs 126812
> Total Trimmed Contigs 126781
> Total Length 109476821
> Min contig size 100
> Median contig size 365
> Mean contig size 863
> Max contig size 14314
> N50 Contig 16370
> N50 Length 1933
> N90 Contig 66842
> N90 Length 333
>
> 2) diginorm than FLASH
>
> assembly stat result
> --------------------- ------------
> Total Contigs 111434
> Total Trimmed Contigs 111413
> Total Length 111343478
> Min contig size 100
> Median contig size 447
> Mean contig size 999
> Max contig size 20427
> N50 Contig 15236
> N50 Length 2163
> N90 Contig 58158
> N90 Length 410
>
>
> 3) FLASH than diginorm
>
> assembly stat result
> --------------------- ------------
> Total Contigs 90612
> Total Trimmed Contigs 90612
> Total Length 86485229
> Min contig size 119
> Median contig size 586
> Mean contig size 954
> Max contig size 14006
> N50 Contig 16436
> N50 Length 1506
> N90 Contig 60314
> N90 Length 396
>
>
> It's interesting, and seems to make sense, that merging reads prior to
> diginorm results in the assembly with the fewest contigs (FYI - based on
> the closest genome for this species, I expect ~17k genes so way more
> transcripts than genes). I'm leaning towards using this as my final
> assembly as having fewer and longer (at least than diginorm alone) contigs
> seems preferable.
>
> thanks,
> John
>
>
>
> On Fri, Jul 26, 2013 at 4:10 PM, John Stanton-Geddes <johnsg at uvm.edu> wrote:
>
> > Hi Titus and the khmer list,
> > I'm working on transcriptome assembly with samples treated at 12 different
> > temperatures to capture genes expressed across the thermal range of my
> > favorite ant species. I pooled the samples and ran them in a single lane of
> > 100 bp paired end HiSeq, so I have about 16 million reads per sample, 160
> > million reads total.
> >
> > My question:
> > is there any benefit to merging my paired-end reads (e.g. using FLASH
> > http://bioinformatics.oxfordjournals.org/content/early/2011/09/07/bioinformatics.btr507)
> > prior to running diginorm? A preliminary run of FLASH on some of my samples
> > showed that about 65% of reads are merged (which is a bit surprising since
> > the library was supposed to have been size-selected at 200 bp).
> >
> > My thought is to run diginorm on the merged reads, and also on the
> > un-merged reads using the `-p` option as documented previously (
> > http://lists.idyll.org/pipermail/khmer/2013-July/000123.html). I'd then
> > combine all these and run a second pass of diginorm.
> >
> > Is this a valid approach, or is merging reads redundant with what diginorm
> > does (since reads that add extra coverage would be tossed out anyway)?
> >
> > Apologies if this is a noob question.
> >
> > Thanks for the software!
> >
> > -John
> >
> > --
> > Postdoctoral Research Associate
> > Department of Biology, University of Vermont
> > Room 211, Marsh Life Science Building
> > 109 Carrigan Drive
> > Burlington, Vermont 05405
> > www.johnstantongeddes.org
> >
>
>
>
> --
> Postdoctoral Research Associate
> Department of Biology, University of Vermont
> Room 211, Marsh Life Science Building
> 109 Carrigan Drive
> Burlington, Vermont 05405
> www.johnstantongeddes.org
> _______________________________________________
> khmer mailing list
> khmer at lists.idyll.org
> http://lists.idyll.org/listinfo/khmer
--
C. Titus Brown, ctb at msu.edu
More information about the khmer
mailing list