[khmer] Issue interleaving reads with cutadapt processed fastq

C. Titus Brown ctbrown at ucdavis.edu
Thu Sep 17 06:33:51 PDT 2015


Will, if you could try this:

python -c "import screed; print sum((1 for r in screed.open('file.fq')))"

(replacing file.fq with each of the two files), you'll get a count of
records as read by our sequence parsing code.  If those numbers disagree
then the formatting is off somehow.

best,
--titus

On Thu, Sep 17, 2015 at 03:09:36AM +0000, Fields, Christopher J wrote:
> (cc???ing the list back for the reply)
> 
> Not sure re: whether khmer deals with this though it might explain the issue.  You could run a count in your file for any empty lines, something like "zgrep -c '^$??? file_name.fastq.gz??? (or pipe in the data via zcat if your system doesn???t have zgrep).  Not sure if cutadapt does anything silly like leave spaces, so you may need to adjust the grep accordingly.
> 
> Both the sequence and the quality would be empty, so the # of records with empty lines should be count / 2.
> 
> chris
> 
> On Sep 16, 2015, at 9:08 PM, Will Shoemaker <wrshoema at umail.iu.edu<mailto:wrshoema at umail.iu.edu>> wrote:
> 
> Hi Chris,
> 
> I checked the cutadapt docs and all the versions keep empty fastq reads and they don't have an option to remove them. I don't know how to check for empty fsatq reads using bash commands (I don't know what an empty fastq read looks like), but I checked the number of reads in both the original and quality filtered fastq files and they have the same number of reads, so cutadapt is likely keeping empty reads.
> 
> Is this still an issue for the newest version of khmer?
> 
> Best,
> Will
> 
> On Wed, Sep 16, 2015 at 1:35 PM, Fields, Christopher J <cjfields at illinois.edu<mailto:cjfields at illinois.edu>> wrote:
> I may be mis-remembering this, but I recall cutadapt giving empty sequences with paired data before (maybe this has been fixed).  Are any of the sequences empty?
> 
> chris
> 
> On Sep 16, 2015, at 12:27 PM, Will Shoemaker <wrshoema at umail.iu.edu<mailto:wrshoema at umail.iu.edu>> wrote:
> 
> Hello,
> 
> I am unable to merge pairs of MiSeq reads using the khmer scrip interleave-reads.py in khmer version 1.3. The R1 and R2 files have had the first 10 bases trimmed off and have been quality filtered using cutadapt v1.9.
> 
> Using the command zcat file_name.fastq.gz | echo $((`wc -l`/4)) on each set of reads, I found that the number of reads in R1 and R2 is the same.
> 
> The command I'm running is:
> interleave-reads.py -o output.fastq.gz R1.fastq.gz R2.fastq.gz   (file names changed for readability)
> 
> My OS is Linux 2.6.32-573.3.1.el6.x86_64 x86_64
> 
> Attached is a txt file of the khmer output.
> 
> Could this be an issue of cutadapt changing the file format? I am able to run assemblies on cutadapt processed reads.
> 
> 
> Best,
> Will Shoemaker
> --
> Will Shoemaker
> Indiana University
> Graduate Student: Lennon Lab
> Evolution, Ecology, & Behavior Program
> Jordan Hall 238
> wrshoema at umail.iu.edu<mailto:wrshoema at umail.iu.edu>
> @shoemakah<https://urldefense.proofpoint.com/v2/url?u=https-3A__twitter.com_shoemakah&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=fbHa8Njtvh9VmSnzJxiEUTW9NWDwMMwQAzhgZDO41GQ&m=2hmDL9wRzmlU_2g0L0tNoUVmhlNMH4HLOevw22IEZds&s=83OIHhrZUaG5B6v_qoLzlJTxaI8_4EB_ZY-KTuz_aeg&e=>
> <khmer_error.txt>_______________________________________________
> khmer mailing list
> khmer at lists.idyll.org<mailto:khmer at lists.idyll.org>
> http://lists.idyll.org/listinfo/khmer<https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.idyll.org_listinfo_khmer&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=fbHa8Njtvh9VmSnzJxiEUTW9NWDwMMwQAzhgZDO41GQ&m=u_DlmqMPGHijsrZ9pAPXLoTqzk5dxyWDlo_2q7Ns38o&s=lXG01P2mA6bN-cYiGtRXjf5Z4gmw35U9yDe-V7joBJo&e=>
> 
> 
> 
> 
> --
> Will Shoemaker
> Indiana University
> Graduate Student: Lennon Lab
> Evolution, Ecology, & Behavior Program
> Jordan Hall 238
> wrshoema at umail.iu.edu<mailto:wrshoema at umail.iu.edu>
> @shoemakah<https://urldefense.proofpoint.com/v2/url?u=https-3A__twitter.com_shoemakah&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=fbHa8Njtvh9VmSnzJxiEUTW9NWDwMMwQAzhgZDO41GQ&m=u_DlmqMPGHijsrZ9pAPXLoTqzk5dxyWDlo_2q7Ns38o&s=bjJBicFWoouMwyO7lfcM1Mv465XnxIrOOVS-Qo0cUdk&e=>
> 

> _______________________________________________
> khmer mailing list
> khmer at lists.idyll.org
> http://lists.idyll.org/listinfo/khmer


-- 
C. Titus Brown, ctbrown at ucdavis.edu



More information about the khmer mailing list