[khmer] khmer stripped header information from RNA-seq reads, rendering them unusable

Erich Marquard Schwarz ems394 at cornell.edu
Thu Jul 17 11:36:40 PDT 2014


On Jul 17, 2014, at 2:25 PM, Philipp Schiffer <philipp.schiffer at gmail.com> wrote:

> you can indeed do whatever you like there. However, as I tried to indicate, it might really make sense to go with -1, -2 or /1, /2.
> My guess is that a lot of scripts could struggle with the "#" you are using.

    Any script that can handle older-format Illumina reads will do fine, which is all of mine, along with many standard programs (e.g., older bowtie works fine with the older format, because that older format was the standard when bowtie was first designed!).  So, you may be right in many cases, but in this case I don't really need to worry about using the older Illumina format.  As with all things Unix, it is a matter of *which* pesky details are going to be lethal in a given context.


> Meanwhile it is also possible to just "repair" the reads in the .keep file by comparison with the raw reads file where headers have been fixed. Might save some time....

    Ugh.  I recognize that you are correct, and in some circumstances I would do that, but I think trying to fix a munged file is inherently more error-prone than just making a file that will be bullet-proof.  Again, you're certainly not wrong, but it's a question of what particular gotchas one is trying to steer clear of.

    I recently told a bioinformatics class in Yerevan, Armenia: "A great deal of bioinformatics consists of converting data from one file format to another, rather than actually doing computations on the data."  Sad, but true.


> Good luck

    Thanks!


--Erich





More information about the khmer mailing list