[khmer] khmer stripped header information from RNA-seq reads, rendering them unusable
C. Titus Brown
ctb at msu.edu
Fri Jul 18 04:58:39 PDT 2014
On Thu, Jul 17, 2014 at 08:42:18PM +0000, Erich Marquard Schwarz wrote:
> On Jul 17, 2014, at 4:21 PM, C. Titus Brown <ctb at msu.edu> wrote:
>
> > For now, we've settled on the following set of advice:
> > https://khmer-protocols.readthedocs.org/en/latest/mrnaseq/index.html
> >
> > Note the use of scripts in
> > https://khmer-protocols.readthedocs.org/en/latest/mrnaseq/1-quality.html
>
> Fair enough.
>
> It might not be a bad idea to note, explicitly in the docs, that interleave-reads.py and extract-paired-reads.py have the side effect of appending '/1' and '/2' to reads, and that this behavior is *necessary* in order to avoid problems down the road with khmer stripping away the now-standard Illumina format of spaced suffixes (" 1" and " 2"). Although I am sure that this seems obvious behavior to people who've been developing khmer for, literally, years, it is *not* obvious behavior in fact, and it's a nontrivial detail. So adding that warning to the docs might be time well spent.
+1
> > We are nearing a point where we can fix this behavior in khmer itself, but
> > in the past the parsing code has been a major pain point in terms of new
> > bugs, so we've avoided making too many changes there...
>
> Sure, that makes sense. Not a huge deal on my end. I have been tending to retro-suffix my Illumina reads anyway, since doing so avoids problems with programs like bowtie that were developed in the paleo Illumina suffix era. I just made the mistake of trying to be a young hipster this time.
+1
--titus
More information about the khmer
mailing list