<div dir="ltr">Shu,<div><br></div><div>The split-paired-reads script will take a file where you have matching pairs and simply split them. &nbsp;If the resulting files are different sizes, its likely that there are pairs missing in your original files. &nbsp;I would start searching there. &nbsp;</div>


<div><br></div><div>Cheers,</div><div>Adina</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, Mar 28, 2014 at 11:44 AM, Shu CHEN <span dir="ltr">&lt;<a href="mailto:szc0049@tigermail.auburn.edu" target="_blank">szc0049@tigermail.auburn.edu</a>&gt;</span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div>

<div style="font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif">

<p>What I have is illumina Hiseq data, and&nbsp;I followed the&nbsp;<span style="color:#282828;font-size:15px;font-family:&#39;microsoft yahei ui&#39;,&#39;microsoft yahei&#39;,微软雅黑,simsun,宋体,sans-serif">E</span><span style="color:#282828;font-size:15px;font-family:&#39;microsoft yahei ui&#39;,&#39;microsoft yahei&#39;,微软雅黑,simsun,宋体,sans-serif">el

 Pond mRNAseq Protocol (<a href="https://khmer-protocols.readthedocs.org/en/latest/mrnaseq/" target="_blank">https://khmer-protocols.readthedocs.org/en/latest/mrnaseq/</a>&nbsp;)</span>, except the step 1,<span style="font-size:12pt;line-height:15.600000381469727px">&nbsp;trimming the adapters.&nbsp;</span></p>


<p><span style="font-size:12pt;line-height:15.600000381469727px"><span style="font-size:16px;font-family:calibri,arial,helvetica,sans-serif">The left.fq and right.fq were generated from the file &#39;.pe.qc.keep.adunfilt.fq.gz</span><span style="font-size:16px;font-family:calibri,arial,helvetica,sans-serif">&#39;,

 using the script &#39;split-paired-reads.py&#39;,&nbsp;before run through Trinity.</span></span></p>

<p><span style="font-size:12pt;line-height:15.600000381469727px">I followed the protocol exactly and&nbsp;I don&#39;t understand why I got the different sized files.</span></p><div class="">

<div>

<p><br>

</p>

<div style="font-family:tahoma;font-size:13px">

<div style="font-family:tahoma;font-size:13px">

<p>Shu Chen<br>

</p>

<p>Ph.D. Student</p>

<p>Department of Agronomy and Soils</p>

<p>RM 165, Funchess Hall </p>

</div>

</div>

</div>

</div><div style="color:#282828">

<hr style="display:inline-block;width:98%">

<div dir="ltr"><font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>发件人:</b> <a href="mailto:adina.chuang@gmail.com" target="_blank">adina.chuang@gmail.com</a> &lt;<a href="mailto:adina.chuang@gmail.com" target="_blank">adina.chuang@gmail.com</a>&gt; 代表 Adina Chuang Howe &lt;<a href="mailto:howead@msu.edu" target="_blank">howead@msu.edu</a>&gt;<br>


<b>发送时间:</b> 2014年3月28日 15:18<br>

<b>收件人:</b> Shu CHEN<br>

<b>抄送:</b> protocols<br>

<b>主题:</b> Re: [protocols] Questions about mRNAseq data diginorm</font>

<div>&nbsp;</div>

</div><div><div class="h5">

<div>

<div dir="ltr">Hi Shu,

<div><br>

</div>

<div>Thanks for writing and using diginorm. &nbsp;If you are using a metagenomic dataset, I would not expect that much difference between diginormed and nondiginormed data (see Howe et al, 2014). &nbsp;That being said, this would depend on your dataset - diginorm is

 not going to work well for datasets with lots of repetitive regions for examples. &nbsp;</div>

<div><br>

</div>

<div>If your adapters are trimmed off (which is highly unusual -- your reads should be different lengths in this case), you should be able to skip the trimming step. &nbsp;The interleave step brings together your paired end reads that remain after trimming into

 one file. &nbsp;So you can skip this step if your file has each pair in this order (&gt;pair 1, sequence, &gt;pair 2, sequence). &nbsp;</div>

<div><br>

</div>

<div>I&#39;m not sure what left and right fq you are referring, but it is not unusual to have different sized pair files after quality trimming. &nbsp;You&#39;ll want to do some processing to pull out pairs that remain and treat any quality trimmed orphaned pairs as single

 (unpaired) sequences. &nbsp;Again, for diginorm, if you want to consider paired ends, you&#39;ll want to keep it in the format described above.</div>

<div><br>

</div>

<div>In general, I would advise you to start the process with the dataset mentioned in the tutorial. It becomes a lot clearer then what each step does. &nbsp;There&#39;s a lot of steps that are filtering, processing, etc. that happen even prior to diginorm. &nbsp;This might

 be where the difference lies in your first question.</div>

<div><br>

</div>

<div>Good luck!</div>

<div>Adina</div>

<div><br>

</div>

<div><br>

</div>

</div>

<div class="gmail_extra"><br>

<br>

<div class="gmail_quote">On Fri, Mar 28, 2014 at 11:06 AM, Shu CHEN <span dir="ltr">

&lt;<a href="mailto:szc0049@tigermail.auburn.edu" target="_blank">szc0049@tigermail.auburn.edu</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:#cccccc;border-left-style:solid;padding-left:1ex">

<div>

<div style="font-size:12pt;font-family:calibri,arial,helvetica,sans-serif">

<p><span style="color:#282828;font-size:15px;font-family:&#39;microsoft yahei ui&#39;,&#39;microsoft yahei&#39;,微软雅黑,simsun,宋体,sans-serif">Hi,&nbsp;</span></p>

<p><span style="color:#282828;font-size:15px;font-family:&#39;microsoft yahei ui&#39;,&#39;microsoft yahei&#39;,微软雅黑,simsun,宋体,sans-serif"><br>

</span></p>

<p><span style="color:#282828;font-size:15px;font-family:&#39;microsoft yahei ui&#39;,&#39;microsoft yahei&#39;,微软雅黑,simsun,宋体,sans-serif">&nbsp;I am trying to use khmer to diginorm my illumina data and I&nbsp;have some questions about it:&nbsp;</span><br style="color:#282828;font-size:15px;font-family:&#39;microsoft yahei ui&#39;,&#39;microsoft yahei&#39;,微软雅黑,simsun,宋体,sans-serif">


<br style="color:#282828;font-size:15px;font-family:&#39;microsoft yahei ui&#39;,&#39;microsoft yahei&#39;,微软雅黑,simsun,宋体,sans-serif">

<span style="color:#282828;font-size:15px;font-family:&#39;microsoft yahei ui&#39;,&#39;microsoft yahei&#39;,微软雅黑,simsun,宋体,sans-serif">1.&nbsp;I diginormed my data following the Eel Pond mRNAseq Protocol. The N50 of the&nbsp;assembly&nbsp;is 1242, smaller than the assembly from

 the non-diginormed data, and also the number of the contigs is half of the non-diginormed. Is this normal that both&nbsp;assembly size and N50&nbsp;becomes&nbsp;smaller?</span><br style="color:#282828;font-size:15px;font-family:&#39;microsoft yahei ui&#39;,&#39;microsoft yahei&#39;,微软雅黑,simsun,宋体,sans-serif">


<br style="color:#282828;font-size:15px;font-family:&#39;microsoft yahei ui&#39;,&#39;microsoft yahei&#39;,微软雅黑,simsun,宋体,sans-serif">

<span style="color:#282828;font-size:15px;font-family:&#39;microsoft yahei ui&#39;,&#39;microsoft yahei&#39;,微软雅黑,simsun,宋体,sans-serif">2.&nbsp;Because I got the data with adapter trimmed off, so I passed the step 1 and directly went to the interleave step. Does this

 cause any problems in the downstream process?</span><br style="color:#282828;font-size:15px;font-family:&#39;microsoft yahei ui&#39;,&#39;microsoft yahei&#39;,微软雅黑,simsun,宋体,sans-serif">

<br style="color:#282828;font-size:15px;font-family:&#39;microsoft yahei ui&#39;,&#39;microsoft yahei&#39;,微软雅黑,simsun,宋体,sans-serif">

<span style="color:#282828;font-size:15px;font-family:&#39;microsoft yahei ui&#39;,&#39;microsoft yahei&#39;,微软雅黑,simsun,宋体,sans-serif">&nbsp;3. After splitting the *.pe file into left.fq and right.fq, the left.fq has a size of 1.7gb, and thr right.fq 1.4gb. Is this

 okay?</span><br style="color:#282828;font-size:15px;font-family:&#39;microsoft yahei ui&#39;,&#39;microsoft yahei&#39;,微软雅黑,simsun,宋体,sans-serif">

<br style="color:#282828;font-size:15px;font-family:&#39;microsoft yahei ui&#39;,&#39;microsoft yahei&#39;,微软雅黑,simsun,宋体,sans-serif">

<span style="color:#282828;font-size:15px;font-family:&#39;microsoft yahei ui&#39;,&#39;microsoft yahei&#39;,微软雅黑,simsun,宋体,sans-serif">Thank you very much. I appreciate your time and patience.</span><br>

</p>

<div>

<p><br>

</p>

<div style="font-family:tahoma;font-size:13px">

<div style="font-family:tahoma;font-size:13px">

<p>Shu Chen</p>

<p>Ph.D. Student</p>

<p>Department of Agronomy and Soils</p>

<p>RM 165, Funchess Hall </p>

</div>

</div>

</div>

</div>

</div>

<br>

_______________________________________________<br>

protocols mailing list<br>

<a href="mailto:protocols@lists.idyll.org" target="_blank">protocols@lists.idyll.org</a><br>

<a href="http://lists.idyll.org/listinfo/protocols" target="_blank">http://lists.idyll.org/listinfo/protocols</a><br>

<br>

</blockquote>

</div>

<br>

</div>

</div>

</div></div></div>

</div>

</div>


</blockquote></div><br></div>