[khmer] sandbox/filter-median-and-pct.py error
C. Titus Brown
ctb at msu.edu
Thu Jun 12 03:58:40 PDT 2014
On Wed, May 28, 2014 at 11:27:28AM +0200, Joseph Tran wrote:
> Hello Michael
>
> Thanks for your answer.
>
> I understand now why this script was not functional under the master branch.
> The doc you mentioned was very useful to me to understand how to use khmer in my context.
>
> In the meantime, I installed khmer using the kmer_error_profile branch, and got the sandbox/filter-median-and-pct.py script to work.
> Can I use this version of the script?
> Or rather do you recommend to use this script or should I use the main digital normalization protocol instead?
>
> My thought is that it would be useful to clean and reduce the amount of data in the particular context i am working on.
> I need to assemble several mitochondria genomes, but the samples are all contaminated by chloroplast and the nucleus.
> The nucleus coverage is nearly 1x so the assembly step should discard the corresponding reads.
> And I have nearly 2000x for HiSeq and 60-700x for MiSeq for Mito and Chloro.
> The idea was to normalize the coverage for the mitochondrion and the chloroplast to use only one kmer in some meta assembler.
> Then the assembler should be able to distinguish the 2 genomes considering all chimeric or repeat nodes.
>
> If you have some advices, it would be great.
Joseph,
apologies for the delay -- this got lost in my inbox, which can be a very
scary place.
I would strongly recommend following the approach we used here,
http://www.pnas.org/content/early/2014/03/13/1402564111.abstract
which is written up here:
https://khmer-protocols.readthedocs.org/en/latest/metagenomics/index.html
Note that this is still a bit of a work in progress so feedback is welcome!
cheers,
--titus
More information about the khmer
mailing list