[khmer] Counting hash Table step 2 Metagenome assembly

Fri Mar 8 06:50:37 PST 2013

On Fri, Mar 08, 2013 at 12:12:38PM +0100, Alexis Groppi wrote:
> I'm starting to use your tools (khmer) for paleometagenomics analysis  
> (25000 years old DNA samples)
> In the Handbook, for metagenome assembly, the step 2 consist in trimming  
> sequences at a min k-mer abundance with filter-abund.py (in the handbook  
> the script is named filter-below-abund , but I guess it's the same)
> The counting hash table <input.kh> must be built before with  
> load-into-counting.py... but on the original fasta file or on the .keep  
> file resulting from the step 1 (normalize-bymedian.py) ?

Hi Alexis,

it's not the same -- see 'sandbox/filter-below-abund.py'.  This one
gets rid of repeats, while filter-abund will eliminate real info from
your data set (low-abundance components that show up in metag data).

Use --savehash to generate a hash table on the normalize-by-median step (step
#1), OR use load-into-counting on the .keep file.  That is, you want to
run it on the results of digital normalization.

cheers,
--titus
-- 
C. Titus Brown, ctb at msu.edu