[khmer] Guiding principles for khmer?
C. Titus Brown
ctb at msu.edu
Sat Nov 15 06:55:04 PST 2014
Hi all,
as we think about the next few years of khmer development, I think it is helpful to explore what khmer is, roughly speaking, and what our goals should be.
Here’s a rough cut; I’d like to turn this into a blog post, but only after some feedback from the list (if any).
----
khmer is:
* a stable research platform for novel CS/bio research on data structures and algorithms, mostly k-mer based;
* a test bed for software engineering practice in science;
* a Python library for working with k-mers and graph structures;
* an exercise in community building in scientific software engineering;
* an exercise in ecosystem participation in scientific software engineering;
----
khmer long term goals, in some rough order of priority:
* Keep khmer versatile and agile enough to easily enable the CS and bio we want to do. Practical implications: limit complexity of internals as much as possible.
* Continue community building. Practical implications: run khmer as a real open source project, with everything done in the open; work nicely with other projects.
* Build, sustain, and maintain a set of protocols and recipes around khmer. Practical implications: take workflow design into account.
* Improve the efficiency (time/memory) of khmer implementations. Practical implications: optimize, but not at expense of clean code. Some specifics: streaming; variable sized counters.
* Lower barriers to an increasing user base. Practical implications: find actual pain points, address if it’s easy or makes good sense. Some specifics: hash function k > 32, stranded hash function, integrate efficient k-mer cardinality counting, implement dynamically sized data structures.
* Keep khmer technologically up to date. Practical implications: transition to Python 3.
——
Thoughts? What am I missing? What should be added or changed?
cheers,
—titus
More information about the khmer
mailing list