[khmer] khmer read partitioning
C. Titus Brown
ctb at msu.edu
Mon May 13 08:10:01 PDT 2013
Hi Vlad,
small partitions -- partitions with fewer than five reads, I think by
default -- are eliminated, as they are not going to result in anything
assembled.
The main place to look is 'extract-partitions', the parameter
--min-partition-size. You would also need to modify annotate-partitions to
output reads that don't have any partition, although I think that will only
apply to fairly short reads with no overlap with anything else.
cheers,
--titus
On Mon, May 13, 2013 at 06:50:35PM +0400, Vlad Saveliev wrote:
> Hello, Mr. Brown;
>
> I'm working on metagenomic assembly and I've tried khmer tool for graph
> partitioning purposes. I have a question concerning the partitioning
> workflow: https://khmer.readthedocs.org/en/latest/partitioning-big-data.html
>
> Using iowa-corn-50m data set, I performed the following procedures:
> python load-graph.py -k 32 -N 8 -x 16e9 50m iowa-corn-50m.fa
> python partition-graph.py --threads 32 -s 1e5 50m
> python merge-partitions.py 50m
> python annotate-partitions.py 50m iowa-corn-50m.fa
> python extract-partitions.py iowa-corn-50m iowa-corn-50m.fa.part
>
> Initially, there are 50m reads in the data set. After partitioning, I found
> 8 groups with 10,641,552 reads totally.
>
> I can't understand why there are 39,358,448 reads miss. Did I miss an
> important idea here? I've read the paper 'Scaling metagenome sequence
> assembly with probabilistic de Bruijn graphs' and didn't find the answer
> there. Could you help me to understand this issue?
>
> Best regards,
> Vlad Saveliev
> St. Petersburg Algorithmic Biology Lab
> bioinf.spbau.ru
--
C. Titus Brown, ctb at msu.edu
More information about the khmer
mailing list