[khmer] Guiding principles for khmer?

Qingpeng Zhang qingpeng at gmail.com
Sat Nov 15 09:00:29 PST 2014


My two cents. Sorry if there is any redundancy to what you have mentioned.

There are two levels in khmer, one about
basic/foundamental hash/kmer manipulation, the other one more about
specific applications - like partitioning, diginorm even the iGSs stuff I
am working on. Apparently there will be other applications based on khmer,
from internal or external. They may or may not be integrated into Khmer
package. Its difficult to distinguish these two levels clearly, since some
application oriented function/api become the foudation for other
application development. Like the method to get median kmer frequency. But
I think it will be good to keep this in mind. Basically I think one
criteria to distinguish these two is to see if it is worth being
implemented as API rather than Python script. Scripts can be more
application oriented like many stuffs in sandbox folder. Generally I think
the development may be more focused on the first level or API.

API is a big advantage of Khmer compared to many other kmer counting tools
and is the power of Khmer. We may need more effort to make them stable and
better documented.

One of the pain points is how to choose proper parameters of hash
structure. I made some efforts on this but it is still not satisfying.

Best,
QP

On Saturday, November 15, 2014, C. Titus Brown <ctb at msu.edu
<javascript:_e(%7B%7D,'cvml','ctb at msu.edu');>> wrote:

> Hi all,
>
> as we think about the next few years of khmer development, I think it is
> helpful to explore what khmer is, roughly speaking, and what our goals
> should be.
>
> Here’s a rough cut; I’d like to turn this into a blog post, but only after
> some feedback from the list (if any).
>
> ----
>
> khmer is:
>
> * a stable research platform for novel CS/bio research on data structures
> and algorithms, mostly k-mer based;
> * a test bed for software engineering practice in science;
> * a Python library for working with k-mers and graph structures;
> * an exercise in community building in scientific software engineering;
> * an exercise in ecosystem participation in scientific software
> engineering;
>
> ----
>
> khmer long term goals, in some rough order of priority:
>
> * Keep khmer versatile and agile enough to easily enable the CS and bio we
> want to do.  Practical implications: limit complexity of internals as much
> as possible.
>
> * Continue community building. Practical implications: run khmer as a real
> open source project, with everything done in the open; work nicely with
> other projects.
>
> * Build, sustain, and maintain a set of protocols and recipes around
> khmer. Practical implications: take workflow design into account.
>
> * Improve the efficiency (time/memory) of khmer implementations.
> Practical implications: optimize, but not at expense of clean code. Some
> specifics: streaming; variable sized counters.
>
> * Lower barriers to an increasing user base. Practical implications: find
> actual pain points, address if it’s easy or makes good sense. Some
> specifics: hash function k > 32, stranded hash function, integrate
> efficient k-mer cardinality counting, implement dynamically sized data
> structures.
>
> * Keep khmer technologically up to date. Practical implications:
> transition to Python 3.
>
> ——
>
> Thoughts? What am I missing? What should be added or changed?
>
> cheers,
> —titus
>
>
> _______________________________________________
> khmer mailing list
> khmer at lists.idyll.org
> http://lists.idyll.org/listinfo/khmer
>


-- 
--
Qingpeng Zhang
qingpeng at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20141115/b5f3a373/attachment.htm>


More information about the khmer mailing list