[khmer] sandbox/filter-median-and-pct.py error

Thu Jun 12 03:58:40 PDT 2014

On Wed, May 28, 2014 at 11:27:28AM +0200, Joseph Tran wrote:
> Hello Michael
> 
> Thanks for your answer.
> 
> I understand now why this script was not functional under the master branch.
> The doc you mentioned was very useful to me to understand how to use khmer in my context. 
> 
> In the meantime, I installed khmer using the kmer_error_profile branch, and got the sandbox/filter-median-and-pct.py script to work.
> Can I use this version of the script?
> Or rather do you recommend to use this script or should I use the main digital normalization protocol instead?
> 
> My thought is that it would be useful to clean and reduce the amount of data in the particular context i am working on.
> I need to assemble several mitochondria genomes, but the samples are all contaminated by chloroplast and the nucleus.
> The nucleus coverage is nearly 1x so the assembly step should discard the corresponding reads.
> And I have nearly 2000x for HiSeq and 60-700x for MiSeq for Mito and Chloro.
> The idea was to normalize the coverage for the mitochondrion and the chloroplast to use only one kmer in some meta assembler.
> Then the assembler should be able to distinguish the 2 genomes considering all chimeric or repeat nodes.
> 
> If you have some advices, it would be great.

Joseph,

apologies for the delay -- this got lost in my inbox, which can be a very
scary place.

I would strongly recommend following the approach we used here,

http://www.pnas.org/content/early/2014/03/13/1402564111.abstract

which is written up here:

https://khmer-protocols.readthedocs.org/en/latest/metagenomics/index.html

Note that this is still a bit of a work in progress so feedback is welcome!

cheers,
--titus