[protocols] Khmer produces many partitions

Thu Mar 27 04:39:05 PDT 2014

On Wed, Mar 26, 2014 at 09:31:03PM -0700, Connor Skennerton wrote:
> Hey Khmer team,
> 
> I?ve been working through your khmer protocols using some metagenomic data that I have and I?ve just completed the part where you run ?do-partition.py? to generate the *.part files. The only thing is though that looking at the terminal output from this script it?s wanting to create literally millions of partitions! Have you guys ever seen this kind of thing before?
> 
> I should point out though that my data comes from deep sea sediment so it?s more complex than what this protocol was designed for. I found the following link on the khmer documentation (http://khmer.readthedocs.org/en/latest/partitioning-big-data.html) that talks about big datasets. Is this still the preferred workflow for big datasets or do you have some new tricks up your sleeve.

Hi Connor,

check this out:

ivory.idyll.org/blog/2014-assembling-soil.html

I'd love to hear more about sediment... should be as nasty as soil, I
believe!

So, good news? Everything should work. Bad news? You would expect many
partitions.

cheers,
-titus
-- 
C. Titus Brown, ctb at msu.edu