[khmer] diginorm on merged reads

C. Titus Brown ctb at msu.edu
Mon Oct 7 08:11:39 PDT 2013


Excellent, thank you for the followup!  Please let me know when you
publish (or open your data) as it'd be nice to put together a protocol
for this kind of thing.

cheers,
--titus

On Mon, Oct 07, 2013 at 11:05:53AM -0400, John Stanton-Geddes wrote:
> Hi Titus and all,
> Following up on my previous question - I ran a few different assemblies
> exploring the effect of using khmer digital normalization and FLASH to
> merge short reads. I compared the results of (1) running diginorm only, (2)
> running diginorm than attempting to merge still-paired reads with FLASH,
> and (3) first attempting to merge paired reads with FLASH followed by
> diginorm. In all cases, I used trimmed-and-filtered reads and performed
> assembly using velvet-oases with a kmer of 21. Below are some assembly
> statistics.
> 
> 1) diginorm only
> 
> assembly stat                result
> ---------------------               ------------
> Total Contigs                 126812
> Total Trimmed Contigs   126781
> Total Length                  109476821
> Min contig size              100
> Median contig size         365
> Mean contig size            863
> Max contig size             14314
> N50 Contig                    16370
> N50 Length                   1933
> N90 Contig                    66842
> N90 Length                   333
> 
> 2) diginorm than FLASH
> 
> assembly stat                result
> ---------------------                ------------
> Total Contigs                   111434
> Total Trimmed Contigs     111413
> Total Length                    111343478
> Min contig size                100
> Median contig size           447
> Mean contig size             999
> Max contig size               20427
> N50 Contig                      15236
> N50 Length                      2163
> N90 Contig                      58158
> N90 Length                     410
> 
> 
> 3) FLASH than diginorm
> 
> assembly stat                result
> ---------------------               ------------
> Total Contigs                  90612
> Total Trimmed Contigs    90612
> Total Length                   86485229
> Min contig size               119
> Median contig size          586
> Mean contig size            954
> Max contig size             14006
> N50 Contig                    16436
> N50 Length                    1506
> N90 Contig                    60314
> N90 Length                    396
> 
> 
> It's interesting, and seems to make sense, that merging reads prior to
> diginorm results in the assembly with the fewest contigs (FYI - based on
> the closest genome for this species, I expect ~17k genes so way more
> transcripts than genes). I'm leaning towards using this as my final
> assembly as having fewer and longer (at least than diginorm alone) contigs
> seems preferable.
> 
> thanks,
> John
> 
> 
> 
> On Fri, Jul 26, 2013 at 4:10 PM, John Stanton-Geddes <johnsg at uvm.edu> wrote:
> 
> > Hi Titus and the khmer list,
> > I'm working on transcriptome assembly with samples treated at 12 different
> > temperatures to capture genes expressed across the thermal range of my
> > favorite ant species. I pooled the samples and ran them in a single lane of
> > 100 bp paired end HiSeq, so I have about 16 million reads per sample, 160
> > million reads total.
> >
> > My question:
> > is there any benefit to merging my paired-end reads (e.g. using FLASH
> > http://bioinformatics.oxfordjournals.org/content/early/2011/09/07/bioinformatics.btr507)
> > prior to running diginorm? A preliminary run of FLASH on some of my samples
> > showed that about 65% of reads are merged (which is a bit surprising since
> > the library was supposed to have been size-selected at 200 bp).
> >
> > My thought is to run diginorm on the merged reads, and also on the
> > un-merged reads using the `-p` option as documented previously (
> > http://lists.idyll.org/pipermail/khmer/2013-July/000123.html). I'd then
> > combine all these and run a second pass of diginorm.
> >
> > Is this a valid approach, or is merging reads redundant with what diginorm
> > does (since reads that add extra coverage would be tossed out anyway)?
> >
> > Apologies if this is a noob question.
> >
> > Thanks for the software!
> >
> > -John
> >
> > --
> > Postdoctoral Research Associate
> > Department of Biology, University of Vermont
> > Room 211, Marsh Life Science Building
> > 109 Carrigan Drive
> > Burlington, Vermont 05405
> > www.johnstantongeddes.org
> >
> 
> 
> 
> -- 
> Postdoctoral Research Associate
> Department of Biology, University of Vermont
> Room 211, Marsh Life Science Building
> 109 Carrigan Drive
> Burlington, Vermont 05405
> www.johnstantongeddes.org

> _______________________________________________
> khmer mailing list
> khmer at lists.idyll.org
> http://lists.idyll.org/listinfo/khmer


-- 
C. Titus Brown, ctb at msu.edu




More information about the khmer mailing list