[khmer] Fwd: split-and-strip script ... support for current Illumina headers?

C. Titus Brown ctb at msu.edu
Mon Jul 22 05:35:29 PDT 2013


Sorry, didn't realize this conversation was on the old librelist mailing list!

Begin forwarded message:

> From: "C. Titus Brown" <ctb at msu.edu>
> Subject: Re: [khmer] split-and-strip script ... support for current Illumina headers?
> Date: July 19, 2013 3:43:44 PM EDT
> To: khmer at librelist.com
> Reply-To: titus at idyll.org
> 
> On Fri, Jul 19, 2013 at 12:20:32PM -0700, Joseph Fass wrote:
>> I may be missing something, but I've cloned the current khmer git
>> repository and installed it, and tried the split-and-strip script at the
>> end of a digital normalization three-stage run. The headers of the reads
>> (in the fasta file resulting from normalization) are identical for a
>> forward and reverse read, because (apparently) anything after the first
>> whitespace has been truncated. I'd asked about this in a comment on the
>> tutorial page, and was told that it had been fixed, but I'm not seeing any
>> change. Am I missing something?
>> 
>> Also, another question about the tutorial: any reason it demonstrates
>> normalizing down to a median coverage of 5? Isn't this way below Velvet's
>> target k-mer coverage range of 20-30x?
> 
> Hi Joe,
> 
> sorry I didn't respond to you on disqus yet :)
> 
> First, unfortunately, we have not kept up well with Illumina's various
> formatting changes.  The sandbox/interleave script will "fix" your
> initial PE files to have /1 and /2 at the end of the first field in
> the FASTA header, or you can tack 'em on yourself.  The processing
> scripts under the bleeding-edge branch tend to be more aware of these
> issues but we're still far from good.  We're definitely aware of this
> and working to fix it.
> 
> For the second issue, diginorm actually changes the meaning of coverage
> from 'random' to 'systematic'; see the graphs here:
> 
> http://ivory.idyll.org/blog/what-is-diginorm.html
> 
> For genomic DNA, you could subsample your data quite easily down to 30x but it
> would still be random coverage -- bad juju, you'd miss a lot of stuff.  With
> diginorm you chop off the high coverage while retaining the low coverage.
> 
> cheers,
> --titus





More information about the khmer mailing list