[khmer] Normalization

Facundo Giorello fagire at gmail.com
Fri Aug 30 08:10:08 PDT 2013

Hi Titus,

Im trying to do a consensus transcriptome from fourteen transcriptomic

For this end, I join all the reads of the 14 libraries, and then run
and Trinity in silico normalization (Trinitynorm) before the assembly (with
I also did the assembly using the join of all 14 libraries reads without
normalization ("multireads strategy").

Unexpectedly, the multireads strategy was the best reconstructing the
coding sequences. I think that this is given by the fact that the assembly
from multireads
assembled longest contigs  (despite that the contig median and average
length is lower
compared with the assembly from both normalization algorithms)

On the other hand, it seems that the assembly from normalization
algorithms "recovered" more (potentially) isoforms.
If I look for the number of contigs that were reconstructed for each
coding sequence,  normalization strategies reconstructed more contigs
than multireads (despite normalization strategy reconstructs worse the
distinct coding sequences). --Please see the attached file.— In fact I
calculated the average number of repeated "comp_XXX" (using the ID of
Trinity assembled contigs, that I think, are alternatives
reconstructions of a given contig) and Diginorm and Trinitynorm
assembly had the double that the assembly from multireads.

Other thing that I saw, is that the assembly after normalizations
recovered more distinct genes in total (after blastx annotation).

Do you think that my explanations of the results are fine? Any idea on
why the assembly after normalization cause a more fragmented assembly?
(I ran both, Diginorm and Trinitynorm with a "K" (kmer) of 25 )

*Diginorm command:
normalize-by-median.py -p -C 30 -k 25 -N 4 -x 2.5e9 reads_shuffled.fastq

*Trinitynorm command:

normalize_by_kmer_coverage.pl --seqType fq --JM 100G --max_cov 30
--left left.fq --right right.fq --pairs_together --PARALLEL_STATS

The assembly was performed using paired end reads. For run Diginorm, I
shuffle the reads before the normalization.

Thanks in advance and sorry for the bad english,



*The Oil Crash* <http://oilcrash.net/recursos/promptuarium/prontuario/>

* *

* La Urgente Necesidad de Cobrar Consciencia*

*Peak Oil Production May Already Be Here. Science

That's oil, folks. Nature

*The Peak of the Oil Age - analyzing the world oil production. Reference
Scenario in World Energy Outlook 2008. Energy Policy

Year in review—EROI or energy return on (energy) invested.Ann N Y Acad

*Energy return on investment, peak oil, and the end of economic growth. Ann
N Y Acad Sci. 2011.* <http://www.ncbi.nlm.nih.gov/pubmed/21332492>

Global energy crunch: How different parts of the world would react to a
peak oil scenario. Energy Policy

 IEA <http://www.worldenergyoutlook.org/docs/weo2010/weo2010_es_spanish.pdf>
-*FMI* <http://www.imf.org/external/pubs/ft/weo/2011/01/pdf/text.pdf>-Chatham
-*Hess Corporation*<http://green.blogs.nytimes.com/2011/03/08/a-dark-warning-on-global-oil-demand/>

*Paul Krugman* <http://www.nytimes.com/2010/12/27/opinion/27krugman.html>-François
Fillon <http://www.crisisenergetica.org/article.php?story=20110408153709150>
-*Robert Hirsch*<http://www.crisisenergetica.org/article.php?story=20101015205730248>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20130830/fb843d5b/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: results.ods
Type: application/vnd.oasis.opendocument.spreadsheet
Size: 13843 bytes
Desc: not available
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20130830/fb843d5b/attachment-0002.ods>

More information about the khmer mailing list