[khmer] Question: -C cutoff of estimating genome size

luoxiao luoxiao at genomics.cn
Mon Nov 10 00:37:09 PST 2014


Dear Colleague,
I am doubted with the the set of cutoff [-C] when using the estimate-genome-size.py  program.
Fist, I use  plot-abundance-dist.py to plot the 17mer spectrum, just as follows(set xlim & ylim):


From the picture, I assumed those kmer abundance less than 50X  have high frequency and may be derived from sequencing error. 
So, I set [-C]=50 when using the estimate-genome-size.py  program, and the Estimated (meta)genome size is: 53602214 bp  (our data is from metagenome and the sequence size is about 5G).
However, according to your guidance displayed in the khmer website, I also set [-C]=20  and others parameter were unchanged when using the estimate-genome-size.py  program,
but the Estimated (meta)genome size is: 32765613 bp , what a big the difference it is!
So I  am confused about  how to choose  cutoff [-C]. Hope you can give me some useful advices. 
Thank you very much!



Best Wishes



Xiao Luo
BGI-Shenzhen China
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20141110/bf062135/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/jpeg
Size: 32020 bytes
Desc: not available
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20141110/bf062135/attachment-0001.jpeg>


More information about the khmer mailing list