<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
Hi Titus,<br>
<br>
For do-partition.py the positional arguments are<br>
graphbase = ??<br>
input_filenames = .below files ?<br>
<br>
Where does the graphbase come from ? I suppose I have to generate it
via the load-graph.py script ?<br>
Or am I wrong ?<br>
<br>
Thanks again and again :)<br>
<br>
Alexis<br>
<br>
<div class="moz-cite-prefix">Le 15/03/2013 17:46, C. Titus Brown a
écrit :<br>
</div>
<blockquote cite="mid:20130315164640.GA7314@idyll.org" type="cite">
<pre wrap="">On Fri, Mar 15, 2013 at 05:41:54PM +0100, Alexis Groppi wrote:
</pre>
<blockquote type="cite">
<pre wrap="">Hi Titus,
With the great help of Eric, I went through the first 2 steps of
Metagenome Assembly.
I'm moving to the step 3 : Partinionning.
My questions are :
Given the amount of the reads (75 nt) I have (between 2.5 and 4 million)
per fasta file,
What would be your best choice for the workflow ? :
- load-graph, partition-graph, ... etc like described in the handbook
fore "Large data sets" which is not my case ?
or something else ?
- More important, given my difficulties with the previous steps, what
would you chose as parameters for the scripts ?:
load-graph, partition-graph, ... ?
</pre>
</blockquote>
<pre wrap="">
Hi Alexis,
the workflow in the handbook should work but is probably overkill. You might
take a look at the 'do-partition.py' script which will partition a single
file for you, doing all the steps.
The same -x and -N parameters should work for this, and use a lot less
memory.
best,
--titus
</pre>
<blockquote type="cite">
<pre wrap="">
Thanks again
Alexis, khmer addict ;)
Le 08/03/2013 15:50, C. Titus Brown a ?crit :
</pre>
<blockquote type="cite">
<pre wrap="">On Fri, Mar 08, 2013 at 12:12:38PM +0100, Alexis Groppi wrote:
</pre>
<blockquote type="cite">
<pre wrap="">I'm starting to use your tools (khmer) for paleometagenomics analysis
(25000 years old DNA samples)
In the Handbook, for metagenome assembly, the step 2 consist in trimming
sequences at a min k-mer abundance with filter-abund.py (in the handbook
the script is named filter-below-abund , but I guess it's the same)
The counting hash table <input.kh> must be built before with
load-into-counting.py... but on the original fasta file or on the .keep
file resulting from the step 1 (normalize-bymedian.py) ?
</pre>
</blockquote>
<pre wrap="">Hi Alexis,
it's not the same -- see 'sandbox/filter-below-abund.py'. This one
gets rid of repeats, while filter-abund will eliminate real info from
your data set (low-abundance components that show up in metag data).
Use --savehash to generate a hash table on the normalize-by-median step (step
#1), OR use load-into-counting on the .keep file. That is, you want to
run it on the results of digital normalization.
cheers,
--titus
</pre>
</blockquote>
<pre wrap="">
--
</pre>
</blockquote>
<pre wrap="">
</pre>
</blockquote>
<br>
<div class="moz-signature">-- <br>
<img src="cid:part1.03030201.07080304@u-bordeaux2.fr" border="0"></div>
</body>
</html>