[khmer] khmer read partitioning

Mon May 13 08:10:01 PDT 2013

Hi Vlad,

small partitions -- partitions with fewer than five reads, I think by
default -- are eliminated, as they are not going to result in anything
assembled.

The main place to look is 'extract-partitions', the parameter
--min-partition-size.  You would also need to modify annotate-partitions to
output reads that don't have any partition, although I think that will only
apply to fairly short reads with no overlap with anything else.

cheers,
--titus

On Mon, May 13, 2013 at 06:50:35PM +0400, Vlad Saveliev wrote:
>  Hello, Mr. Brown;
> 
> I'm working on metagenomic assembly and I've tried khmer tool for graph
> partitioning purposes. I have a question concerning the partitioning
> workflow: https://khmer.readthedocs.org/en/latest/partitioning-big-data.html
> 
> Using iowa-corn-50m data set, I performed the following procedures:
>      python load-graph.py -k 32 -N 8 -x 16e9 50m iowa-corn-50m.fa
>      python partition-graph.py --threads 32 -s 1e5 50m
>      python merge-partitions.py 50m
>      python annotate-partitions.py 50m iowa-corn-50m.fa
>      python extract-partitions.py iowa-corn-50m iowa-corn-50m.fa.part
> 
> Initially, there are 50m reads in the data set. After partitioning, I found
> 8 groups with 10,641,552 reads totally.
> 
> I can't understand why there are 39,358,448 reads miss. Did I miss an
> important idea here? I've read the paper 'Scaling metagenome sequence
> assembly with probabilistic de Bruijn graphs' and didn't find the answer
> there. Could you help me to understand this issue?
> 
> Best regards,
> Vlad Saveliev
> St. Petersburg Algorithmic Biology Lab
> bioinf.spbau.ru

-- 
C. Titus Brown, ctb at msu.edu