[protocols] 答复: Questions about mRNAseq data diginorm

Fri Mar 28 08:52:09 PDT 2014

Shu,

The split-paired-reads script will take a file where you have matching
pairs and simply split them.  If the resulting files are different sizes,
its likely that there are pairs missing in your original files.  I would
start searching there.

Cheers,
Adina

On Fri, Mar 28, 2014 at 11:44 AM, Shu CHEN <szc0049 at tigermail.auburn.edu>wrote:

>  What I have is illumina Hiseq data, and I followed the Eel Pond mRNAseq
> Protocol (https://khmer-protocols.readthedocs.org/en/latest/mrnaseq/ ),
> except the step 1, trimming the adapters.
>
> The left.fq and right.fq were generated from the file
> '.pe.qc.keep.adunfilt.fq.gz', using the script
> 'split-paired-reads.py', before run through Trinity.
>
> I followed the protocol exactly and I don't understand why I got the
> different sized files.
>
>
>   Shu Chen
>
> Ph.D. Student
>
> Department of Agronomy and Soils
>
> RM 165, Funchess Hall
>   ------------------------------
> *发件人:* adina.chuang at gmail.com <adina.chuang at gmail.com> 代表 Adina Chuang
> Howe <howead at msu.edu>
> *发送时间:* 2014年3月28日 15:18
> *收件人:* Shu CHEN
> *抄送:* protocols
> *主题:* Re: [protocols] Questions about mRNAseq data diginorm
>
>  Hi Shu,
>
>  Thanks for writing and using diginorm.  If you are using a metagenomic
> dataset, I would not expect that much difference between diginormed and
> nondiginormed data (see Howe et al, 2014).  That being said, this would
> depend on your dataset - diginorm is not going to work well for datasets
> with lots of repetitive regions for examples.
>
>  If your adapters are trimmed off (which is highly unusual -- your reads
> should be different lengths in this case), you should be able to skip the
> trimming step.  The interleave step brings together your paired end reads
> that remain after trimming into one file.  So you can skip this step if
> your file has each pair in this order (>pair 1, sequence, >pair 2,
> sequence).
>
>  I'm not sure what left and right fq you are referring, but it is not
> unusual to have different sized pair files after quality trimming.  You'll
> want to do some processing to pull out pairs that remain and treat any
> quality trimmed orphaned pairs as single (unpaired) sequences.  Again, for
> diginorm, if you want to consider paired ends, you'll want to keep it in
> the format described above.
>
>  In general, I would advise you to start the process with the dataset
> mentioned in the tutorial. It becomes a lot clearer then what each step
> does.  There's a lot of steps that are filtering, processing, etc. that
> happen even prior to diginorm.  This might be where the difference lies in
> your first question.
>
>  Good luck!
> Adina
>
>
>
>
> On Fri, Mar 28, 2014 at 11:06 AM, Shu CHEN <szc0049 at tigermail.auburn.edu>wrote:
>
>>  Hi,
>>
>>
>>   I am trying to use khmer to diginorm my illumina data and I have some
>> questions about it:
>>
>> 1. I diginormed my data following the Eel Pond mRNAseq Protocol. The N50
>> of the assembly is 1242, smaller than the assembly from the non-diginormed
>> data, and also the number of the contigs is half of the non-diginormed. Is
>> this normal that both assembly size and N50 becomes smaller?
>>
>> 2. Because I got the data with adapter trimmed off, so I passed the step
>> 1 and directly went to the interleave step. Does this cause any problems in
>> the downstream process?
>>
>>  3. After splitting the *.pe file into left.fq and right.fq, the left.fq
>> has a size of 1.7gb, and thr right.fq 1.4gb. Is this okay?
>>
>> Thank you very much. I appreciate your time and patience.
>>
>>
>>   Shu Chen
>>
>> Ph.D. Student
>>
>> Department of Agronomy and Soils
>>
>> RM 165, Funchess Hall
>>
>> _______________________________________________
>> protocols mailing list
>> protocols at lists.idyll.org
>> http://lists.idyll.org/listinfo/protocols
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.idyll.org/pipermail/protocols/attachments/20140328/987c8e66/attachment-0001.htm>