[khmer] Counting hash Table step 2 Metagenome assembly

C. Titus Brown ctb at msu.edu
Fri Mar 15 09:46:40 PDT 2013


On Fri, Mar 15, 2013 at 05:41:54PM +0100, Alexis Groppi wrote:
> Hi Titus,
>
> With the great help of Eric, I went through the first 2 steps of  
> Metagenome Assembly.
> I'm moving to the step 3 : Partinionning.
> My questions are :
> Given the amount of the reads (75 nt) I have (between 2.5 and 4 million)  
> per fasta file,
> What would be your best choice for the workflow ? :
> - load-graph, partition-graph, ... etc like described in the handbook  
> fore "Large data sets" which is not my case ?
> or something else ?
> - More important, given my difficulties with the previous steps, what  
> would you chose as parameters for the scripts  ?:
> load-graph, partition-graph, ... ?

Hi Alexis,

the workflow in the handbook should work but is probably overkill.  You might
take a look at the 'do-partition.py' script which will partition a single
file for you, doing all the steps.

The same -x and -N parameters should work for this, and use a lot less
memory.

best,
--titus

>
> Thanks again
>
> Alexis, khmer addict ;)
>
> Le 08/03/2013 15:50, C. Titus Brown a ?crit :
>> On Fri, Mar 08, 2013 at 12:12:38PM +0100, Alexis Groppi wrote:
>>> I'm starting to use your tools (khmer) for paleometagenomics analysis
>>> (25000 years old DNA samples)
>>> In the Handbook, for metagenome assembly, the step 2 consist in trimming
>>> sequences at a min k-mer abundance with filter-abund.py (in the handbook
>>> the script is named filter-below-abund , but I guess it's the same)
>>> The counting hash table <input.kh> must be built before with
>>> load-into-counting.py... but on the original fasta file or on the .keep
>>> file resulting from the step 1 (normalize-bymedian.py) ?
>> Hi Alexis,
>>
>> it's not the same -- see 'sandbox/filter-below-abund.py'.  This one
>> gets rid of repeats, while filter-abund will eliminate real info from
>> your data set (low-abundance components that show up in metag data).
>>
>> Use --savehash to generate a hash table on the normalize-by-median step (step
>> #1), OR use load-into-counting on the .keep file.  That is, you want to
>> run it on the results of digital normalization.
>>
>> cheers,
>> --titus
>
> -- 

-- 
C. Titus Brown, ctb at msu.edu




More information about the khmer mailing list