[khmer] khmer stripped header information from RNA-seq reads, rendering them unusable

Philipp Schiffer philipp.schiffer at gmail.com
Thu Jul 17 11:00:43 PDT 2014


Hi Erich,

I guess this is supposed to happen, and actually won't just be a prob. 
for khmer, but other tools as well (e.g. bowtie, samtools I think). I 
strongly recommend formatting your headers either with a trailing -1, -2 
or /1, /2 (no spaces!;

DHKW5DQ1:285:D1T8EACXX:7:1101:1397:2177-1

DHKW5DQ1:285:D1T8EACXX:7:1101:1397:2177/1

) from the beginning and always stick to this rule.
A simple perl oneliner will help.

Cheers

Philipp


> Erich Marquard Schwarz <mailto:ems394 at cornell.edu>
> 17 July 2014 18:19
> Hi all,
>
> I used khmer to begin normalizing RNA-seq data with this command:
>
> normalize-by-median.py -k 20 -C 20 -x 2e9 -N 4 --savehash 
> Csp1_rna_2014.07.16.filt.jumbled.kh Csp1_rna_2014.07.16.filt.jumbled.fa ;
>
> which produced Csp1_rna_2014.07.16.filt.jumbled.fa.keep.
>
> Unfortunately, I was not aware that khmer has the nasty side effect of 
> stripping header information. Here are two header texts -- the first 
> from Csp1_rna_2014.07.16.filt.jumbled.fa, the second from its khmer 
> product Csp1_rna_2014.07.16.filt.jumbled.fa.keep:
>
> >DHKW5DQ1:285:D1T8EACXX:7:1101:1397:2177 1:N:0:TATGTGGC
>
> >DHKW5DQ1:285:D1T8EACXX:7:1101:1397:2177
>
> The first header line has paired-end information using Illumina's new 
> format (with trailing ' 1' and ' 2' -- which I agree is less robust 
> than the old-style '#1' and '#2' suffixes that Illumina used to use, 
> but Illumina is the 800-pound gorilla here, and we are its mere 
> servant chimps).
>
> That header-stripping 'feature' of khmer totally trashed my later work 
> on the data. I will have to retroname the reads (give them "#1' and 
> "#2' old-style suffixes) so that I can get khmer to work with them 
> without wrecking their usability for later re-sorting and subsequent 
> uses (in this case, genome RNA-scaffolding).
>
> Lost time, roughly one day.
>
> The version I have of khmer was installed on 9/4/2012. If this 
> side-effect has been fixed since then, that's good news; if not, then 
> it'd be good if it *were* fixed.
>
> Thank you,
>
>
> --Erich
>
>
>
> _______________________________________________
> khmer mailing list
> khmer at lists.idyll.org
> http://lists.idyll.org/listinfo/khmer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20140717/6287796e/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: compose-unknown-contact.jpg
Type: image/jpeg
Size: 770 bytes
Desc: not available
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20140717/6287796e/attachment.jpg>


More information about the khmer mailing list