[protocols] Questions about mRNAseq data diginorm

Fri Mar 28 08:18:05 PDT 2014

Hi Shu,

Thanks for writing and using diginorm.  If you are using a metagenomic
dataset, I would not expect that much difference between diginormed and
nondiginormed data (see Howe et al, 2014).  That being said, this would
depend on your dataset - diginorm is not going to work well for datasets
with lots of repetitive regions for examples.

If your adapters are trimmed off (which is highly unusual -- your reads
should be different lengths in this case), you should be able to skip the
trimming step.  The interleave step brings together your paired end reads
that remain after trimming into one file.  So you can skip this step if
your file has each pair in this order (>pair 1, sequence, >pair 2,
sequence).

I'm not sure what left and right fq you are referring, but it is not
unusual to have different sized pair files after quality trimming.  You'll
want to do some processing to pull out pairs that remain and treat any
quality trimmed orphaned pairs as single (unpaired) sequences.  Again, for
diginorm, if you want to consider paired ends, you'll want to keep it in
the format described above.

In general, I would advise you to start the process with the dataset
mentioned in the tutorial. It becomes a lot clearer then what each step
does.  There's a lot of steps that are filtering, processing, etc. that
happen even prior to diginorm.  This might be where the difference lies in
your first question.

Good luck!
Adina

On Fri, Mar 28, 2014 at 11:06 AM, Shu CHEN <szc0049 at tigermail.auburn.edu>wrote:

>  Hi,
>
>
>   I am trying to use khmer to diginorm my illumina data and I have some
> questions about it:
>
> 1. I diginormed my data following the Eel Pond mRNAseq Protocol. The N50
> of the assembly is 1242, smaller than the assembly from the non-diginormed
> data, and also the number of the contigs is half of the non-diginormed. Is
> this normal that both assembly size and N50 becomes smaller?
>
> 2. Because I got the data with adapter trimmed off, so I passed the step 1
> and directly went to the interleave step. Does this cause any problems in
> the downstream process?
>
>  3. After splitting the *.pe file into left.fq and right.fq, the left.fq
> has a size of 1.7gb, and thr right.fq 1.4gb. Is this okay?
>
> Thank you very much. I appreciate your time and patience.
>
>
>   Shu Chen
>
> Ph.D. Student
>
> Department of Agronomy and Soils
>
> RM 165, Funchess Hall
>
> _______________________________________________
> protocols mailing list
> protocols at lists.idyll.org
> http://lists.idyll.org/listinfo/protocols
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.idyll.org/pipermail/protocols/attachments/20140328/10aa16c7/attachment.htm>