My two cents. Sorry if there is any redundancy to what you have mentioned.<div><br><div>There are two levels in khmer, one about basic/foundamental hash/kmer manipulation, the other one more about specific applications - like partitioning, diginorm even the iGSs stuff I am working on. Apparently there will be other applications based on khmer, from internal or external. They may or may not be integrated into Khmer package. Its difficult to distinguish these two levels clearly, since some application oriented function/api become the foudation for other application development. Like the method to get median kmer frequency. But I think it will be good to keep this in mind. Basically I think one criteria to distinguish these two is to see if it is worth being implemented as API rather than Python script. Scripts can be more application oriented like many stuffs in sandbox folder. Generally I think the development may be more focused on the first level or API. </div><div><br></div><div>API is a big advantage of Khmer compared to many other kmer counting tools and is the power of Khmer. We may need more effort to make them stable and better documented. </div><div><br></div><div>One of the pain points is how to choose proper parameters of hash structure. I made some efforts on this but it is still not satisfying. </div><div><br></div><div>Best,</div><div>QP<span></span><br><br>On Saturday, November 15, 2014, C. Titus Brown &lt;<a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;ctb@msu.edu&#39;);" target="_blank">ctb@msu.edu</a>&gt; wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi all,<br>

<br>

as we think about the next few years of khmer development, I think it is helpful to explore what khmer is, roughly speaking, and what our goals should be.<br>

<br>

Here’s a rough cut; I’d like to turn this into a blog post, but only after some feedback from the list (if any).<br>

<br>

----<br>

<br>

khmer is:<br>

<br>

* a stable research platform for novel CS/bio research on data structures and algorithms, mostly k-mer based;<br>

* a test bed for software engineering practice in science;<br>

* a Python library for working with k-mers and graph structures;<br>

* an exercise in community building in scientific software engineering;<br>

* an exercise in ecosystem participation in scientific software engineering;<br>

<br>

----<br>

<br>

khmer long term goals, in some rough order of priority:<br>

<br>

* Keep khmer versatile and agile enough to easily enable the CS and bio we want to do.  Practical implications: limit complexity of internals as much as possible.<br>

<br>

* Continue community building. Practical implications: run khmer as a real open source project, with everything done in the open; work nicely with other projects.<br>

<br>

* Build, sustain, and maintain a set of protocols and recipes around khmer. Practical implications: take workflow design into account.<br>

<br>

* Improve the efficiency (time/memory) of khmer implementations.  Practical implications: optimize, but not at expense of clean code. Some specifics: streaming; variable sized counters.<br>

<br>

* Lower barriers to an increasing user base. Practical implications: find actual pain points, address if it’s easy or makes good sense. Some specifics: hash function k &gt; 32, stranded hash function, integrate efficient k-mer cardinality counting, implement dynamically sized data structures.<br>

<br>

* Keep khmer technologically up to date. Practical implications: transition to Python 3.<br>

<br>

——<br>

<br>

Thoughts? What am I missing? What should be added or changed?<br>

<br>

cheers,<br>

—titus<br>

<br>

<br>

_______________________________________________<br>

khmer mailing list<br>

<a>khmer@lists.idyll.org</a><br>

<a href="http://lists.idyll.org/listinfo/khmer" target="_blank">http://lists.idyll.org/listinfo/khmer</a><br>

</blockquote>

</div></div><br><br>-- <br>--<br>Qingpeng Zhang<br><a href="mailto:qingpeng@gmail.com" target="_blank">qingpeng@gmail.com</a><br>