[khmer] Counting hash Table step 2 Metagenome assembly

Alexis Groppi alexis.groppi at u-bordeaux2.fr
Fri Mar 15 10:09:18 PDT 2013


Hi Titus,

For do-partition.py the positional arguments are
   graphbase = ??
   input_filenames = .below files ?

Where does the graphbase come from ? I suppose I have to generate it via 
the load-graph.py script ?
Or am I wrong ?

Thanks again  and again :)

Alexis

Le 15/03/2013 17:46, C. Titus Brown a écrit :
> On Fri, Mar 15, 2013 at 05:41:54PM +0100, Alexis Groppi wrote:
>> Hi Titus,
>>
>> With the great help of Eric, I went through the first 2 steps of
>> Metagenome Assembly.
>> I'm moving to the step 3 : Partinionning.
>> My questions are :
>> Given the amount of the reads (75 nt) I have (between 2.5 and 4 million)
>> per fasta file,
>> What would be your best choice for the workflow ? :
>> - load-graph, partition-graph, ... etc like described in the handbook
>> fore "Large data sets" which is not my case ?
>> or something else ?
>> - More important, given my difficulties with the previous steps, what
>> would you chose as parameters for the scripts  ?:
>> load-graph, partition-graph, ... ?
> Hi Alexis,
>
> the workflow in the handbook should work but is probably overkill.  You might
> take a look at the 'do-partition.py' script which will partition a single
> file for you, doing all the steps.
>
> The same -x and -N parameters should work for this, and use a lot less
> memory.
>
> best,
> --titus
>
>> Thanks again
>>
>> Alexis, khmer addict ;)
>>
>> Le 08/03/2013 15:50, C. Titus Brown a ?crit :
>>> On Fri, Mar 08, 2013 at 12:12:38PM +0100, Alexis Groppi wrote:
>>>> I'm starting to use your tools (khmer) for paleometagenomics analysis
>>>> (25000 years old DNA samples)
>>>> In the Handbook, for metagenome assembly, the step 2 consist in trimming
>>>> sequences at a min k-mer abundance with filter-abund.py (in the handbook
>>>> the script is named filter-below-abund , but I guess it's the same)
>>>> The counting hash table <input.kh> must be built before with
>>>> load-into-counting.py... but on the original fasta file or on the .keep
>>>> file resulting from the step 1 (normalize-bymedian.py) ?
>>> Hi Alexis,
>>>
>>> it's not the same -- see 'sandbox/filter-below-abund.py'.  This one
>>> gets rid of repeats, while filter-abund will eliminate real info from
>>> your data set (low-abundance components that show up in metag data).
>>>
>>> Use --savehash to generate a hash table on the normalize-by-median step (step
>>> #1), OR use load-into-counting on the .keep file.  That is, you want to
>>> run it on the results of digital normalization.
>>>
>>> cheers,
>>> --titus
>> -- 

-- 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20130315/feedb768/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Signature_Mail_A_Groppi.png
Type: image/png
Size: 29033 bytes
Desc: not available
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20130315/feedb768/attachment-0002.png>


More information about the khmer mailing list