[khmer] Guiding principles for khmer?

Sat Nov 15 06:55:04 PST 2014

Hi all,

as we think about the next few years of khmer development, I think it is helpful to explore what khmer is, roughly speaking, and what our goals should be.

Here’s a rough cut; I’d like to turn this into a blog post, but only after some feedback from the list (if any).

----

khmer is:

* a stable research platform for novel CS/bio research on data structures and algorithms, mostly k-mer based;
* a test bed for software engineering practice in science;
* a Python library for working with k-mers and graph structures;
* an exercise in community building in scientific software engineering;
* an exercise in ecosystem participation in scientific software engineering;

----

khmer long term goals, in some rough order of priority:

* Keep khmer versatile and agile enough to easily enable the CS and bio we want to do.  Practical implications: limit complexity of internals as much as possible.

* Continue community building. Practical implications: run khmer as a real open source project, with everything done in the open; work nicely with other projects.

* Build, sustain, and maintain a set of protocols and recipes around khmer. Practical implications: take workflow design into account.

* Improve the efficiency (time/memory) of khmer implementations.  Practical implications: optimize, but not at expense of clean code. Some specifics: streaming; variable sized counters.

* Lower barriers to an increasing user base. Practical implications: find actual pain points, address if it’s easy or makes good sense. Some specifics: hash function k > 32, stranded hash function, integrate efficient k-mer cardinality counting, implement dynamically sized data structures.

* Keep khmer technologically up to date. Practical implications: transition to Python 3.

——

Thoughts? What am I missing? What should be added or changed?

cheers,
—titus