<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    Hi Adina,<br>

    <br>

    Thanks for your very clear and very useful comments.<br>

    They meet my thoughts ;)<br>

    I'm starting now from scratch with your advices.<br>

    The questions are about which species (human ? bacterian, other..?)

    species were present in the ancient DNA (~25000 year old) sequenced.<br>

    <br>

    Cheers from Bordeaux<br>

    <br>

    Alexis<br>

    <br>

    <div class="moz-cite-prefix">Le 25/03/2013 16:18, Adina Chuang Howe

      a &eacute;crit&nbsp;:<br>

    </div>

    <blockquote

cite="mid:CAO-C1xUmnzfaaHKVEbcW46xU=9O3Kx+Pu006Eb9ySs_QR-QsHA@mail.gmail.com"

      type="cite">

      <div>Hi Alexis,</div>

      <div><br>

      </div>

      See below for comments.

      <div><br>

      </div>

      <div><br>

        <div class="gmail_quote">

          <blockquote class="gmail_quote" style="margin:0 0 0

            .8ex;border-left:1px #ccc solid;padding-left:1ex">

            &nbsp; &nbsp;1. Dealing with paired-End Data (Alexis Groppi)<br>

            <br>

            <br>

----------------------------------------------------------------------<br>

            <br>

            Message: 1<br>

            Date: Mon, 25 Mar 2013 15:29:19 +0100<br>

            From: Alexis Groppi &lt;<a moz-do-not-send="true"

              href="mailto:alexis.groppi@u-bordeaux2.fr">alexis.groppi@u-bordeaux2.fr</a>&gt;<br>

            Subject: [khmer] Dealing with paired-End Data<br>

            To: "<a moz-do-not-send="true"

              href="mailto:khmer@lists.idyll.org">khmer@lists.idyll.org</a>"

            &lt;<a moz-do-not-send="true"

              href="mailto:khmer@lists.idyll.org">khmer@lists.idyll.org</a>&gt;<br>

            Cc: "C. Titus Brown" &lt;<a moz-do-not-send="true"

              href="mailto:ctb@msu.edu">ctb@msu.edu</a>&gt;<br>

            Message-ID: &lt;<a moz-do-not-send="true"

              href="mailto:51505F3F.3020906@u-bordeaux2.fr">51505F3F.3020906@u-bordeaux2.fr</a>&gt;<br>

            Content-Type: text/plain; charset="iso-8859-1";

            Format="flowed"<br>

            <br>

            Hi Titus,<br>

            <br>

            May be a very dumb question :<br>

            How to deal with paired-end data (Illumina reads of 75 nt) ?<br>

            For some sample, I have paired-end data : it means 2 .fastq

            file<br>

            (SampleN_R1.fastq and SampleN_R2fastq).<br>

            What is the best strategy :<br>

            a/ Treat each file (R1 and R2) separatly (normalization,

            filtering,<br>

            partition) but then how to deal with the resulting files

            .part files<br>

            from each R1 and R2 for assembly ?<br>

          </blockquote>

          <div><br>

          </div>

          <div>We have a couple paired end options for users implemented

            within khmer that take shape in two forms:</div>

          <div><br>

          </div>

          <div>Keep paired ends always:</div>

          <div><br>

          </div>

          <div>There is an option within khmer to retain paired-end

            information, i.e., if digital normalization retains one

            pair, the other pair will also be retained regardless of its

            coverage within a dataset (--paired). &nbsp;</div>

          <div><br>

          </div>

          <div>Currently, the only implementation we have for this (as

            far as I know) requires that you have the paired ends

            adjacent to each other within your dataset. &nbsp;Depending on

            the sequencing facility, you may have to convert R1 and R2

            files to one file with a script like <a

              moz-do-not-send="true"

href="https://github.com/ged-lab/khmer/blob/master/sandbox/interleave.py">https://github.com/ged-lab/khmer/blob/master/sandbox/interleave.py</a></div>

          <div><br>

          </div>

          <div>If you do turn this option off, you should keep in mind

            that diginorm gives precedence to the order in which reads

            are taken as an input to decide whether to retain it or not.

            &nbsp;For reads which contain the same information and are above

            the coverage threshold, diginorm will keep the first ones it

            sees. &nbsp;The take home here is to feed in your best reads

            first.</div>

          <div><br>

          </div>

          <div>Use any paired end information for assembly:</div>

          <div><br>

          </div>

          <div>Assemblies can be run with paired ends even if I turn off

            the paired end retention parameter in diginorm - with the

            strip and split for assembly script which separates paired

            end reads and single end reads that remain after diginorm.</div>

          <div><br>

          </div>

          <div>Which to choose:</div>

          <div>To choose what you want to do, it really depends on your

            question and the type of coverage you think you have for

            your dataset. &nbsp;For complex metagenomes, I have to balance

            data reduction with paired end information in order to be

            able to complete my assemblies efficiently. &nbsp;Its difficult

            to provide advice on this without knowing what your

            questions are. &nbsp;</div>

          <div><br>

          </div>

          <div>If you're focused on scaffolding and longer assemblies in

            general, maybe you want to prioritize the retention of your

            paired ends. If you're having trouble completing assemblies

            at all, you might try discarding more data at the cost of

            paired ends.&nbsp;</div>

          <div><br>

          </div>

          <div>I've found that assembly involves much trial and error

            with a result that you can always improve upon and can

            constantly change. &nbsp;Given this, there's not clear workflow

            that I can offer advice on for every user except to get your

            data to a point where rapid exploration can occur. &nbsp;I've

            started to work with aggressively quality trimmed data in

            which I lose paired end information all the time so I tend

            nowadays to not worry about retaining paired ends in my

            workflow. &nbsp;</div>

          <div><br>

          </div>

          <div>Hope this helps and good luck,</div>

          <div>Adina</div>

          <div><br>

          </div>

          <div><br>

          </div>

        </div>

      </div>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

khmer mailing list

<a class="moz-txt-link-abbreviated" href="mailto:khmer@lists.idyll.org">khmer@lists.idyll.org</a>

<a class="moz-txt-link-freetext" href="http://lists.idyll.org/listinfo/khmer">http://lists.idyll.org/listinfo/khmer</a>

</pre>

    </blockquote>

    <br>

    <div class="moz-signature">-- <br>

      <img src="cid:part7.07050703.04020904@u-bordeaux2.fr" border="0"></div>

  </body>

</html>