[khmer] Counting hash Table step 2 Metagenome assembly

Fri Mar 15 10:10:56 PDT 2013

On Fri, Mar 15, 2013 at 06:09:18PM +0100, Alexis Groppi wrote:
> Hi Titus,
>
> For do-partition.py the positional arguments are
>   graphbase = ??
>   input_filenames = .below files ?
>
> Where does the graphbase come from ? I suppose I have to generate it via  
> the load-graph.py script ?
> Or am I wrong ?
>
> Thanks again  and again :)

Hi Alexis,

the graphbase is just a name you need to choose for the data set.  It's
just used for temp file storage.

--titus

> Le 15/03/2013 17:46, C. Titus Brown a ?crit :
>> On Fri, Mar 15, 2013 at 05:41:54PM +0100, Alexis Groppi wrote:
>>> Hi Titus,
>>>
>>> With the great help of Eric, I went through the first 2 steps of
>>> Metagenome Assembly.
>>> I'm moving to the step 3 : Partinionning.
>>> My questions are :
>>> Given the amount of the reads (75 nt) I have (between 2.5 and 4 million)
>>> per fasta file,
>>> What would be your best choice for the workflow ? :
>>> - load-graph, partition-graph, ... etc like described in the handbook
>>> fore "Large data sets" which is not my case ?
>>> or something else ?
>>> - More important, given my difficulties with the previous steps, what
>>> would you chose as parameters for the scripts  ?:
>>> load-graph, partition-graph, ... ?
>> Hi Alexis,
>>
>> the workflow in the handbook should work but is probably overkill.  You might
>> take a look at the 'do-partition.py' script which will partition a single
>> file for you, doing all the steps.
>>
>> The same -x and -N parameters should work for this, and use a lot less
>> memory.
>>
>> best,
>> --titus
>>
>>> Thanks again
>>>
>>> Alexis, khmer addict ;)
>>>
>>> Le 08/03/2013 15:50, C. Titus Brown a ?crit :
>>>> On Fri, Mar 08, 2013 at 12:12:38PM +0100, Alexis Groppi wrote:
>>>>> I'm starting to use your tools (khmer) for paleometagenomics analysis
>>>>> (25000 years old DNA samples)
>>>>> In the Handbook, for metagenome assembly, the step 2 consist in trimming
>>>>> sequences at a min k-mer abundance with filter-abund.py (in the handbook
>>>>> the script is named filter-below-abund , but I guess it's the same)
>>>>> The counting hash table <input.kh> must be built before with
>>>>> load-into-counting.py... but on the original fasta file or on the .keep
>>>>> file resulting from the step 1 (normalize-bymedian.py) ?
>>>> Hi Alexis,
>>>>
>>>> it's not the same -- see 'sandbox/filter-below-abund.py'.  This one
>>>> gets rid of repeats, while filter-abund will eliminate real info from
>>>> your data set (low-abundance components that show up in metag data).
>>>>
>>>> Use --savehash to generate a hash table on the normalize-by-median step (step
>>>> #1), OR use load-into-counting on the .keep file.  That is, you want to
>>>> run it on the results of digital normalization.
>>>>
>>>> cheers,
>>>> --titus
>>> -- 
>
> -- 

-- 
C. Titus Brown, ctb at msu.edu