<html>

  <head>

    <meta http-equiv="content-type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <font face="Calibri">Hi Raf, hi Titus,<br>

      <br>

      Yesterday, I got this error too with another public dataset after

      running another trimming tool (i.e. trim_galore).<br>

      You probably get after the trimming process, some reads with a

      length lower than the kmer size used to normalize.<br>

      The 'normalize-by-median.py' seems to ignore those reads and

      because the pair is not "found", it raises an error of unpaired

      reads.<br>

      In my case, removing reads shorter than the </font><font

      face="Calibri"><font face="Calibri">kmer size </font> solved the

      problem.<br>

    </font>So it's rather a misleading error message than a bug.<br>

    <br>

    Regards, <br>

    <br>

    Cedric<br>

    <br>

    <blockquote type="cite">

      <pre>Hi Raf,

this sounds like a bug of some sort, but no clear idea of what's

going on, sorry!  I should be able to take a look at this file this

weekend.

thx,

--titus

On Fri, Oct 30, 2015 at 03:37:25PM +0100, Raf Winand wrote:

&gt;<i> Hi

</i>&gt;<i> 

</i>&gt;<i> I'm trying out some of the examples I found on the internet and am now

</i>&gt;<i> working on part of the data that comes with walk-through called Kalamazoo

</i>&gt;<i> Metagenome Assembly protocol. The data set I'm currently trying out is

</i>&gt;<i> SRR492065. After running trimmomatic I end up with two PE files that I

</i>&gt;<i> interleave using the script 'interleave-reads.py'. When I run

</i>&gt;<i> 'normalize-by-median.py' with the --paired option on this interleaved file,

</i>&gt;<i> it gives an error (output below). I also used the script

</i>&gt;<i> 'extract-paired-reads.py' on the interleaved file and when it finishes it

</i>&gt;<i> says "DONE; read 10264272 sequences, 5132136 pairs and 0 singletons" so the

</i>&gt;<i> original file was probably fine. Running the normalization again on the

&gt; output of 'extract-paired-reads.py' gives the exact same error as before.

</i>&gt;<i> 

&gt; Do you have any idea what might be causing this?

</i>&gt;<i> 

</i>&gt;<i> Best regards

</i>&gt;<i> Raf

</i>&gt;<i> 

</i>&gt;<i> 

&gt; || This is the script normalize-by-median.py in khmer.

</i>&gt;<i> || You are running khmer version 2.0+36.g799039f

</i>&gt;<i> || You are also using screed version 0.9

</i>&gt;<i> ||

</i>&gt;<i> || If you use this script in a publication, please cite EACH of the

</i>&gt;<i> following:

</i>&gt;<i> ||

</i>&gt;<i> ||   * MR Crusoe et al., 2015.

</i>&gt;<i> <a href="http://dx.doi.org/10.12688/f1000research.6924.1">http://dx.doi.org/10.12688/f1000research.6924.1</a>

</i>&gt;<i> ||   * CT Brown et al., arXiv:1203.4802 [q-bio.GN]

</i>&gt;<i> ||

</i>&gt;<i> || Please see <a href="http://khmer.readthedocs.org/en/latest/citations.html">http://khmer.readthedocs.org/en/latest/citations.html</a> for

</i>&gt;<i> details.

</i>&gt;<i> 

</i>&gt;<i> 

</i>&gt;<i> PARAMETERS:

</i>&gt;<i>  - kmer size =    20 (-k)

</i>&gt;<i>  - n tables =     4 (-N)

</i>&gt;<i>  - max tablesize = 8e+09 (-x)

</i>&gt;<i> 

</i>&gt;<i> Estimated memory usage is 3.2e+10 bytes (n_tables x max_tablesize)

</i>&gt;<i> --------

</i>&gt;<i> making countgraph

</i>&gt;<i> ... kept 100000 of 100000 or 100.0% sofar

</i>&gt;<i> ... in file SRR492065_trim_combined.fastq.pe

</i>&gt;<i> ... kept 199984 of 200000 or 100.0% sofar

</i>&gt;<i> ... in file SRR492065_trim_combined.fastq.pe

</i>&gt;<i> ... kept 299832 of 300000 or 99.9% sofar

</i>&gt;<i> ... in file SRR492065_trim_combined.fastq.pe

</i>&gt;<i> ... kept 399356 of 400000 or 99.8% sofar

</i>&gt;<i> ... in file SRR492065_trim_combined.fastq.pe

&gt; ** ERROR: Unpaired reads when require_paired is set!

</i>&gt;<i> ** Failed on SRR492065_trim_combined.fastq.pe:

</i>&gt;<i> ** Exiting!

</i>&gt;<i> 

</i>&gt;<i> 

</i>&gt;<i> 

</i>&gt;<i> 

</i>&gt;<i> -- 

</i>&gt;<i> Raf Winand

</i>&gt;<i> PhD student

</i>&gt;<i> Faculty of Engineering - ESAT/STADIUS

</i>&gt;<i> Bioinformatics Group

</i>&gt;<i> Kasteelpark Arenberg 10 bus 2446

</i>&gt;<i> 3001 Heverlee

</i>&gt;<i> BELGIUM

</i>&gt;<i> Tel: +32 16 32 86 43

</i>

&gt;<i> _______________________________________________

</i>&gt;<i> khmer mailing list

</i>&gt;<i> <a href="http://lists.idyll.org/listinfo/khmer">khmer at lists.idyll.org</a>

</i>&gt;<i> <a href="http://lists.idyll.org/listinfo/khmer">http://lists.idyll.org/listinfo/khmer</a>

</i>

-- 

C. Titus Brown, <a href="http://lists.idyll.org/listinfo/khmer">ctbrown at ucdavis.edu</a>

</pre>

    </blockquote>

    <br>

    <pre class="moz-signature" cols="72">-- 

-----------------------------------------------------------------

Cédric Cabau

INRA | SIGENAE | GenPhySE

CS 52627 - 31326 Castanet-Tolosan cedex FRANCE

Tel: +33(0)5.61.28.54.60 - Fax: +33(0)5.61.28.53.08

<a class="moz-txt-link-freetext" href="http://www.sigenae.org/">http://www.sigenae.org/</a>

-----------------------------------------------------------------

</pre>

  </body>

</html>