<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    Hi Titus,<br>

    <br>

    For do-partition.py the positional arguments are<br>

    &nbsp; graphbase = ??<br>

    &nbsp; input_filenames = .below files ?<br>

    <br>

    Where does the graphbase come from ? I suppose I have to generate it

    via the load-graph.py script ?<br>

    Or am I wrong ?<br>

    <br>

    Thanks again&nbsp; and again :)<br>

    <br>

    Alexis<br>

    <br>

    <div class="moz-cite-prefix">Le 15/03/2013 17:46, C. Titus Brown a

      &eacute;crit&nbsp;:<br>

    </div>

    <blockquote cite="mid:20130315164640.GA7314@idyll.org" type="cite">

      <pre wrap="">On Fri, Mar 15, 2013 at 05:41:54PM +0100, Alexis Groppi wrote:

</pre>

      <blockquote type="cite">

        <pre wrap="">Hi Titus,

With the great help of Eric, I went through the first 2 steps of  

Metagenome Assembly.

I'm moving to the step 3 : Partinionning.

My questions are :

Given the amount of the reads (75 nt) I have (between 2.5 and 4 million)  

per fasta file,

What would be your best choice for the workflow ? :

- load-graph, partition-graph, ... etc like described in the handbook  

fore "Large data sets" which is not my case ?

or something else ?

- More important, given my difficulties with the previous steps, what  

would you chose as parameters for the scripts  ?:

load-graph, partition-graph, ... ?

</pre>

      </blockquote>

      <pre wrap="">

Hi Alexis,

the workflow in the handbook should work but is probably overkill.  You might

take a look at the 'do-partition.py' script which will partition a single

file for you, doing all the steps.

The same -x and -N parameters should work for this, and use a lot less

memory.

best,

--titus

</pre>

      <blockquote type="cite">

        <pre wrap="">

Thanks again

Alexis, khmer addict ;)

Le 08/03/2013 15:50, C. Titus Brown a ?crit :

</pre>

        <blockquote type="cite">

          <pre wrap="">On Fri, Mar 08, 2013 at 12:12:38PM +0100, Alexis Groppi wrote:

</pre>

          <blockquote type="cite">

            <pre wrap="">I'm starting to use your tools (khmer) for paleometagenomics analysis

(25000 years old DNA samples)

In the Handbook, for metagenome assembly, the step 2 consist in trimming

sequences at a min k-mer abundance with filter-abund.py (in the handbook

the script is named filter-below-abund , but I guess it's the same)

The counting hash table &lt;input.kh&gt; must be built before with

load-into-counting.py... but on the original fasta file or on the .keep

file resulting from the step 1 (normalize-bymedian.py) ?

</pre>

          </blockquote>

          <pre wrap="">Hi Alexis,

it's not the same -- see 'sandbox/filter-below-abund.py'.  This one

gets rid of repeats, while filter-abund will eliminate real info from

your data set (low-abundance components that show up in metag data).

Use --savehash to generate a hash table on the normalize-by-median step (step

#1), OR use load-into-counting on the .keep file.  That is, you want to

run it on the results of digital normalization.

cheers,

--titus

</pre>

        </blockquote>

        <pre wrap="">

-- 

</pre>

      </blockquote>

      <pre wrap="">

</pre>

    </blockquote>

    <br>

    <div class="moz-signature">-- <br>

      <img src="cid:part1.03030201.07080304@u-bordeaux2.fr" border="0"></div>

  </body>

</html>