<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
Hi Adina,<br>
<br>
Thanks for your very clear and very useful comments.<br>
They meet my thoughts ;)<br>
I'm starting now from scratch with your advices.<br>
The questions are about which species (human ? bacterian, other..?)
species were present in the ancient DNA (~25000 year old) sequenced.<br>
<br>
Cheers from Bordeaux<br>
<br>
Alexis<br>
<br>
<div class="moz-cite-prefix">Le 25/03/2013 16:18, Adina Chuang Howe
a écrit :<br>
</div>
<blockquote
cite="mid:CAO-C1xUmnzfaaHKVEbcW46xU=9O3Kx+Pu006Eb9ySs_QR-QsHA@mail.gmail.com"
type="cite">
<div>Hi Alexis,</div>
<div><br>
</div>
See below for comments.
<div><br>
</div>
<div><br>
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
1. Dealing with paired-End Data (Alexis Groppi)<br>
<br>
<br>
----------------------------------------------------------------------<br>
<br>
Message: 1<br>
Date: Mon, 25 Mar 2013 15:29:19 +0100<br>
From: Alexis Groppi <<a moz-do-not-send="true"
href="mailto:alexis.groppi@u-bordeaux2.fr">alexis.groppi@u-bordeaux2.fr</a>><br>
Subject: [khmer] Dealing with paired-End Data<br>
To: "<a moz-do-not-send="true"
href="mailto:khmer@lists.idyll.org">khmer@lists.idyll.org</a>"
<<a moz-do-not-send="true"
href="mailto:khmer@lists.idyll.org">khmer@lists.idyll.org</a>><br>
Cc: "C. Titus Brown" <<a moz-do-not-send="true"
href="mailto:ctb@msu.edu">ctb@msu.edu</a>><br>
Message-ID: <<a moz-do-not-send="true"
href="mailto:51505F3F.3020906@u-bordeaux2.fr">51505F3F.3020906@u-bordeaux2.fr</a>><br>
Content-Type: text/plain; charset="iso-8859-1";
Format="flowed"<br>
<br>
Hi Titus,<br>
<br>
May be a very dumb question :<br>
How to deal with paired-end data (Illumina reads of 75 nt) ?<br>
For some sample, I have paired-end data : it means 2 .fastq
file<br>
(SampleN_R1.fastq and SampleN_R2fastq).<br>
What is the best strategy :<br>
a/ Treat each file (R1 and R2) separatly (normalization,
filtering,<br>
partition) but then how to deal with the resulting files
.part files<br>
from each R1 and R2 for assembly ?<br>
</blockquote>
<div><br>
</div>
<div>We have a couple paired end options for users implemented
within khmer that take shape in two forms:</div>
<div><br>
</div>
<div>Keep paired ends always:</div>
<div><br>
</div>
<div>There is an option within khmer to retain paired-end
information, i.e., if digital normalization retains one
pair, the other pair will also be retained regardless of its
coverage within a dataset (--paired). </div>
<div><br>
</div>
<div>Currently, the only implementation we have for this (as
far as I know) requires that you have the paired ends
adjacent to each other within your dataset. Depending on
the sequencing facility, you may have to convert R1 and R2
files to one file with a script like <a
moz-do-not-send="true"
href="https://github.com/ged-lab/khmer/blob/master/sandbox/interleave.py">https://github.com/ged-lab/khmer/blob/master/sandbox/interleave.py</a></div>
<div><br>
</div>
<div>If you do turn this option off, you should keep in mind
that diginorm gives precedence to the order in which reads
are taken as an input to decide whether to retain it or not.
For reads which contain the same information and are above
the coverage threshold, diginorm will keep the first ones it
sees. The take home here is to feed in your best reads
first.</div>
<div><br>
</div>
<div>Use any paired end information for assembly:</div>
<div><br>
</div>
<div>Assemblies can be run with paired ends even if I turn off
the paired end retention parameter in diginorm - with the
strip and split for assembly script which separates paired
end reads and single end reads that remain after diginorm.</div>
<div><br>
</div>
<div>Which to choose:</div>
<div>To choose what you want to do, it really depends on your
question and the type of coverage you think you have for
your dataset. For complex metagenomes, I have to balance
data reduction with paired end information in order to be
able to complete my assemblies efficiently. Its difficult
to provide advice on this without knowing what your
questions are. </div>
<div><br>
</div>
<div>If you're focused on scaffolding and longer assemblies in
general, maybe you want to prioritize the retention of your
paired ends. If you're having trouble completing assemblies
at all, you might try discarding more data at the cost of
paired ends. </div>
<div><br>
</div>
<div>I've found that assembly involves much trial and error
with a result that you can always improve upon and can
constantly change. Given this, there's not clear workflow
that I can offer advice on for every user except to get your
data to a point where rapid exploration can occur. I've
started to work with aggressively quality trimmed data in
which I lose paired end information all the time so I tend
nowadays to not worry about retaining paired ends in my
workflow. </div>
<div><br>
</div>
<div>Hope this helps and good luck,</div>
<div>Adina</div>
<div><br>
</div>
<div><br>
</div>
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
khmer mailing list
<a class="moz-txt-link-abbreviated" href="mailto:khmer@lists.idyll.org">khmer@lists.idyll.org</a>
<a class="moz-txt-link-freetext" href="http://lists.idyll.org/listinfo/khmer">http://lists.idyll.org/listinfo/khmer</a>
</pre>
</blockquote>
<br>
<div class="moz-signature">-- <br>
<img src="cid:part7.07050703.04020904@u-bordeaux2.fr" border="0"></div>
</body>
</html>