[khmer] Issue interleaving reads with cutadapt processed fastq

C. Titus Brown ctbrown at ucdavis.edu
Wed Sep 16 10:33:17 PDT 2015


Hi Will,

yep, my guess is that something got munged in the file format and our FASTQ
parser is screwing up on it.

My suggestion is to move to khmer 2.0 and screed 0.9, because we've fixed a lot
of things with sequence name handling. If you can't, I'm happy to take a closer
look at the file format.

best,
--titus

On Wed, Sep 16, 2015 at 01:27:21PM -0400, Will Shoemaker wrote:
> Hello,
> 
> I am unable to merge pairs of MiSeq reads using the khmer scrip
> interleave-reads.py in khmer version 1.3. The R1 and R2 files have had the
> first 10 bases trimmed off and have been quality filtered using cutadapt
> v1.9.
> 
> Using the command zcat file_name.fastq.gz | echo $((`wc -l`/4)) on each set
> of reads, I found that the number of reads in R1 and R2 is the same.
> 
> The command I'm running is:
> interleave-reads.py -o output.fastq.gz R1.fastq.gz R2.fastq.gz   (file
> names changed for readability)
> 
> My OS is Linux 2.6.32-573.3.1.el6.x86_64 x86_64
> 
> Attached is a txt file of the khmer output.
> 
> Could this be an issue of cutadapt changing the file format? I am able to
> run assemblies on cutadapt processed reads.
> 
> 
> Best,
> Will Shoemaker
> -- 
> Will Shoemaker
> Indiana University
> Graduate Student: Lennon Lab
> Evolution, Ecology, & Behavior Program
> Jordan Hall 238
> wrshoema at umail.iu.edu
> @shoemakah <https://twitter.com/shoemakah>

> || This is the script 'interleave-reads.py' in khmer.
> || You are running khmer version 1.3-366-g3ed54b8
> || You are also using screed version 0.8-rc4
> ||
> || If you use this script in a publication, please cite EACH of the following:
> ||
> ||   * MR Crusoe et al., 2014. http://dx.doi.org/10.6084/m9.figshare.979190
> ||
> || Please see http://khmer.readthedocs.org/en/latest/citations.html for details.
> 
> Interleaving:
> 	GSF911-711_S1_L001_R1_001_U10_Q30.fastq
> 	GSF911-711_S1_L001_R2_001_U10_Q30.fastq
> ... 0 pairs
> ERROR: Input files contain different number of records.

> _______________________________________________
> khmer mailing list
> khmer at lists.idyll.org
> http://lists.idyll.org/listinfo/khmer


-- 
C. Titus Brown, ctbrown at ucdavis.edu



More information about the khmer mailing list