<div dir="ltr">Shu,<div><br></div><div>The split-paired-reads script will take a file where you have matching pairs and simply split them. If the resulting files are different sizes, its likely that there are pairs missing in your original files. I would start searching there. </div>
<div><br></div><div>Cheers,</div><div>Adina</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, Mar 28, 2014 at 11:44 AM, Shu CHEN <span dir="ltr"><<a href="mailto:szc0049@tigermail.auburn.edu" target="_blank">szc0049@tigermail.auburn.edu</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>
<div style="font-size:12pt;font-family:Calibri,Arial,Helvetica,sans-serif">
<p>What I have is illumina Hiseq data, and I followed the <span style="color:#282828;font-size:15px;font-family:'microsoft yahei ui','microsoft yahei',΢ÈíÑźÚ,simsun,ËÎÌå,sans-serif">E</span><span style="color:#282828;font-size:15px;font-family:'microsoft yahei ui','microsoft yahei',΢ÈíÑźÚ,simsun,ËÎÌå,sans-serif">el
Pond mRNAseq Protocol (<a href="https://khmer-protocols.readthedocs.org/en/latest/mrnaseq/" target="_blank">https://khmer-protocols.readthedocs.org/en/latest/mrnaseq/</a> )</span>, except the step 1,<span style="font-size:12pt;line-height:15.600000381469727px"> trimming the adapters. </span></p>
<p><span style="font-size:12pt;line-height:15.600000381469727px"><span style="font-size:16px;font-family:calibri,arial,helvetica,sans-serif">The left.fq and right.fq were generated from the file '.pe.qc.keep.adunfilt.fq.gz</span><span style="font-size:16px;font-family:calibri,arial,helvetica,sans-serif">',
using the script 'split-paired-reads.py', before run through Trinity.</span></span></p>
<p><span style="font-size:12pt;line-height:15.600000381469727px">I followed the protocol exactly and I don't understand why I got the different sized files.</span></p><div class="">
<div>
<p><br>
</p>
<div style="font-family:tahoma;font-size:13px">
<div style="font-family:tahoma;font-size:13px">
<p>Shu Chen<br>
</p>
<p>Ph.D. Student</p>
<p>Department of Agronomy and Soils</p>
<p>RM 165, Funchess Hall </p>
</div>
</div>
</div>
</div><div style="color:#282828">
<hr style="display:inline-block;width:98%">
<div dir="ltr"><font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>·¢¼þÈË:</b> <a href="mailto:adina.chuang@gmail.com" target="_blank">adina.chuang@gmail.com</a> <<a href="mailto:adina.chuang@gmail.com" target="_blank">adina.chuang@gmail.com</a>> ´ú±í Adina Chuang Howe <<a href="mailto:howead@msu.edu" target="_blank">howead@msu.edu</a>><br>
<b>·¢ËÍʱ¼ä:</b> 2014Äê3ÔÂ28ÈÕ 15:18<br>
<b>ÊÕ¼þÈË:</b> Shu CHEN<br>
<b>³ËÍ:</b> protocols<br>
<b>Ö÷Ìâ:</b> Re: [protocols] Questions about mRNAseq data diginorm</font>
<div> </div>
</div><div><div class="h5">
<div>
<div dir="ltr">Hi Shu,
<div><br>
</div>
<div>Thanks for writing and using diginorm. If you are using a metagenomic dataset, I would not expect that much difference between diginormed and nondiginormed data (see Howe et al, 2014). That being said, this would depend on your dataset - diginorm is
not going to work well for datasets with lots of repetitive regions for examples. </div>
<div><br>
</div>
<div>If your adapters are trimmed off (which is highly unusual -- your reads should be different lengths in this case), you should be able to skip the trimming step. The interleave step brings together your paired end reads that remain after trimming into
one file. So you can skip this step if your file has each pair in this order (>pair 1, sequence, >pair 2, sequence). </div>
<div><br>
</div>
<div>I'm not sure what left and right fq you are referring, but it is not unusual to have different sized pair files after quality trimming. You'll want to do some processing to pull out pairs that remain and treat any quality trimmed orphaned pairs as single
(unpaired) sequences. Again, for diginorm, if you want to consider paired ends, you'll want to keep it in the format described above.</div>
<div><br>
</div>
<div>In general, I would advise you to start the process with the dataset mentioned in the tutorial. It becomes a lot clearer then what each step does. There's a lot of steps that are filtering, processing, etc. that happen even prior to diginorm. This might
be where the difference lies in your first question.</div>
<div><br>
</div>
<div>Good luck!</div>
<div>Adina</div>
<div><br>
</div>
<div><br>
</div>
</div>
<div class="gmail_extra"><br>
<br>
<div class="gmail_quote">On Fri, Mar 28, 2014 at 11:06 AM, Shu CHEN <span dir="ltr">
<<a href="mailto:szc0049@tigermail.auburn.edu" target="_blank">szc0049@tigermail.auburn.edu</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:#cccccc;border-left-style:solid;padding-left:1ex">
<div>
<div style="font-size:12pt;font-family:calibri,arial,helvetica,sans-serif">
<p><span style="color:#282828;font-size:15px;font-family:'microsoft yahei ui','microsoft yahei',΢ÈíÑźÚ,simsun,ËÎÌå,sans-serif">Hi, </span></p>
<p><span style="color:#282828;font-size:15px;font-family:'microsoft yahei ui','microsoft yahei',΢ÈíÑźÚ,simsun,ËÎÌå,sans-serif"><br>
</span></p>
<p><span style="color:#282828;font-size:15px;font-family:'microsoft yahei ui','microsoft yahei',΢ÈíÑźÚ,simsun,ËÎÌå,sans-serif"> I am trying to use khmer to diginorm my illumina data and I have some questions about it: </span><br style="color:#282828;font-size:15px;font-family:'microsoft yahei ui','microsoft yahei',΢ÈíÑźÚ,simsun,ËÎÌå,sans-serif">
<br style="color:#282828;font-size:15px;font-family:'microsoft yahei ui','microsoft yahei',΢ÈíÑźÚ,simsun,ËÎÌå,sans-serif">
<span style="color:#282828;font-size:15px;font-family:'microsoft yahei ui','microsoft yahei',΢ÈíÑźÚ,simsun,ËÎÌå,sans-serif">1. I diginormed my data following the Eel Pond mRNAseq Protocol. The N50 of the assembly is 1242, smaller than the assembly from
the non-diginormed data, and also the number of the contigs is half of the non-diginormed. Is this normal that both assembly size and N50 becomes smaller?</span><br style="color:#282828;font-size:15px;font-family:'microsoft yahei ui','microsoft yahei',΢ÈíÑźÚ,simsun,ËÎÌå,sans-serif">
<br style="color:#282828;font-size:15px;font-family:'microsoft yahei ui','microsoft yahei',΢ÈíÑźÚ,simsun,ËÎÌå,sans-serif">
<span style="color:#282828;font-size:15px;font-family:'microsoft yahei ui','microsoft yahei',΢ÈíÑźÚ,simsun,ËÎÌå,sans-serif">2. Because I got the data with adapter trimmed off, so I passed the step 1 and directly went to the interleave step. Does this
cause any problems in the downstream process?</span><br style="color:#282828;font-size:15px;font-family:'microsoft yahei ui','microsoft yahei',΢ÈíÑźÚ,simsun,ËÎÌå,sans-serif">
<br style="color:#282828;font-size:15px;font-family:'microsoft yahei ui','microsoft yahei',΢ÈíÑźÚ,simsun,ËÎÌå,sans-serif">
<span style="color:#282828;font-size:15px;font-family:'microsoft yahei ui','microsoft yahei',΢ÈíÑźÚ,simsun,ËÎÌå,sans-serif"> 3. After splitting the *.pe file into left.fq and right.fq, the left.fq has a size of 1.7gb, and thr right.fq 1.4gb. Is this
okay?</span><br style="color:#282828;font-size:15px;font-family:'microsoft yahei ui','microsoft yahei',΢ÈíÑźÚ,simsun,ËÎÌå,sans-serif">
<br style="color:#282828;font-size:15px;font-family:'microsoft yahei ui','microsoft yahei',΢ÈíÑźÚ,simsun,ËÎÌå,sans-serif">
<span style="color:#282828;font-size:15px;font-family:'microsoft yahei ui','microsoft yahei',΢ÈíÑźÚ,simsun,ËÎÌå,sans-serif">Thank you very much. I appreciate your time and patience.</span><br>
</p>
<div>
<p><br>
</p>
<div style="font-family:tahoma;font-size:13px">
<div style="font-family:tahoma;font-size:13px">
<p>Shu Chen</p>
<p>Ph.D. Student</p>
<p>Department of Agronomy and Soils</p>
<p>RM 165, Funchess Hall </p>
</div>
</div>
</div>
</div>
</div>
<br>
_______________________________________________<br>
protocols mailing list<br>
<a href="mailto:protocols@lists.idyll.org" target="_blank">protocols@lists.idyll.org</a><br>
<a href="http://lists.idyll.org/listinfo/protocols" target="_blank">http://lists.idyll.org/listinfo/protocols</a><br>
<br>
</blockquote>
</div>
<br>
</div>
</div>
</div></div></div>
</div>
</div>
</blockquote></div><br></div>