[khmer] Issue interleaving reads with cutadapt processed fastq

Fields, Christopher J cjfields at illinois.edu
Wed Sep 16 20:09:36 PDT 2015


(cc’ing the list back for the reply)

Not sure re: whether khmer deals with this though it might explain the issue.  You could run a count in your file for any empty lines, something like "zgrep -c '^$’ file_name.fastq.gz” (or pipe in the data via zcat if your system doesn’t have zgrep).  Not sure if cutadapt does anything silly like leave spaces, so you may need to adjust the grep accordingly.

Both the sequence and the quality would be empty, so the # of records with empty lines should be count / 2.

chris

On Sep 16, 2015, at 9:08 PM, Will Shoemaker <wrshoema at umail.iu.edu<mailto:wrshoema at umail.iu.edu>> wrote:

Hi Chris,

I checked the cutadapt docs and all the versions keep empty fastq reads and they don't have an option to remove them. I don't know how to check for empty fsatq reads using bash commands (I don't know what an empty fastq read looks like), but I checked the number of reads in both the original and quality filtered fastq files and they have the same number of reads, so cutadapt is likely keeping empty reads.

Is this still an issue for the newest version of khmer?

Best,
Will

On Wed, Sep 16, 2015 at 1:35 PM, Fields, Christopher J <cjfields at illinois.edu<mailto:cjfields at illinois.edu>> wrote:
I may be mis-remembering this, but I recall cutadapt giving empty sequences with paired data before (maybe this has been fixed).  Are any of the sequences empty?

chris

On Sep 16, 2015, at 12:27 PM, Will Shoemaker <wrshoema at umail.iu.edu<mailto:wrshoema at umail.iu.edu>> wrote:

Hello,

I am unable to merge pairs of MiSeq reads using the khmer scrip interleave-reads.py in khmer version 1.3. The R1 and R2 files have had the first 10 bases trimmed off and have been quality filtered using cutadapt v1.9.

Using the command zcat file_name.fastq.gz | echo $((`wc -l`/4)) on each set of reads, I found that the number of reads in R1 and R2 is the same.

The command I'm running is:
interleave-reads.py -o output.fastq.gz R1.fastq.gz R2.fastq.gz   (file names changed for readability)

My OS is Linux 2.6.32-573.3.1.el6.x86_64 x86_64

Attached is a txt file of the khmer output.

Could this be an issue of cutadapt changing the file format? I am able to run assemblies on cutadapt processed reads.


Best,
Will Shoemaker
--
Will Shoemaker
Indiana University
Graduate Student: Lennon Lab
Evolution, Ecology, & Behavior Program
Jordan Hall 238
wrshoema at umail.iu.edu<mailto:wrshoema at umail.iu.edu>
@shoemakah<https://urldefense.proofpoint.com/v2/url?u=https-3A__twitter.com_shoemakah&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=fbHa8Njtvh9VmSnzJxiEUTW9NWDwMMwQAzhgZDO41GQ&m=2hmDL9wRzmlU_2g0L0tNoUVmhlNMH4HLOevw22IEZds&s=83OIHhrZUaG5B6v_qoLzlJTxaI8_4EB_ZY-KTuz_aeg&e=>
<khmer_error.txt>_______________________________________________
khmer mailing list
khmer at lists.idyll.org<mailto:khmer at lists.idyll.org>
http://lists.idyll.org/listinfo/khmer<https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.idyll.org_listinfo_khmer&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=fbHa8Njtvh9VmSnzJxiEUTW9NWDwMMwQAzhgZDO41GQ&m=u_DlmqMPGHijsrZ9pAPXLoTqzk5dxyWDlo_2q7Ns38o&s=lXG01P2mA6bN-cYiGtRXjf5Z4gmw35U9yDe-V7joBJo&e=>




--
Will Shoemaker
Indiana University
Graduate Student: Lennon Lab
Evolution, Ecology, & Behavior Program
Jordan Hall 238
wrshoema at umail.iu.edu<mailto:wrshoema at umail.iu.edu>
@shoemakah<https://urldefense.proofpoint.com/v2/url?u=https-3A__twitter.com_shoemakah&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=fbHa8Njtvh9VmSnzJxiEUTW9NWDwMMwQAzhgZDO41GQ&m=u_DlmqMPGHijsrZ9pAPXLoTqzk5dxyWDlo_2q7Ns38o&s=bjJBicFWoouMwyO7lfcM1Mv465XnxIrOOVS-Qo0cUdk&e=>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20150917/a82afe79/attachment.htm>


More information about the khmer mailing list