[khmer] Diginorm with normalize-by-median-pct.py script

C. Titus Brown ctb at msu.edu
Wed Apr 24 12:52:52 PDT 2013


On Wed, Apr 24, 2013 at 03:47:36PM -0400, Howard W. Fescemyer wrote:
> Dear Titus:
>
> After running your normalize-by-median-pct.py script, two sequence read  
> containing files were obtained in the output; one with the ".keep"  
> extension and the other with the ".keepmedpct" extension.

hi Howard,

the .keepmedpct file is produced by normalize-by-median-pct.py script from

	https://github.com/ctb/khmer/blob/trinity/sandbox/normalize-by-median-pct.py

while the .keep file comes from straight ol' diginorm.

> Please let me know which of these files contains my normalized read  
> data.  The reason I ask is because both files have fastq formatted  
> reads, but contain very different numbers of reads.
>
> I started with about 215 M read pairs.  The".keepmedpct" file has about  
> 17 M read pairs, while the ".keep" file has only about 0.4 M read pairs.
>
> Here is my run command; normalize-by-median-pct.py -p -C 30 -k 25 -N 4  
> -x 4e9 WBtrmd_AllR1R2_modinter.fastq.

That looks about right!

> Here are some other normalization outcomes for comparison; 1) Diginorm  
> using normalize-by-median (C = 5, k = 25) outputs about 13.5 M read  
> pairs, 2) Trinitynorm (max_cov = 30, min_khmer_cov = 2, k = 25) outputs  
> about 9.6 M read pairs, and 3) Trinitynorm (max_cov = 5, min_khmer_cov =  
> 2, k = 25) outputs about 9 M read pairs.
>
> I am in the process of assembling the Trinity normalized data so I can  
> compare it with the assembly using data from Diginorm using  
> normalize-by-median.  It would be great to include in my comparison data  
> from Diginorm using normalize-by-median-pct.

I do not think the C=5 data will be worth using from any of those...  For
RNAseq and Trinity, we generally recommend doing a single pass to C=20.
Lower than that and you will start to accumulate errors, and you will also
get bad assemblies from Trinity.

See:

http://khmer.readthedocs.org/en/latest/guide.html

for our guidelines.

cheers,
--titus
-- 
C. Titus Brown, ctb at msu.edu




More information about the khmer mailing list