[khmer] Labeled de Bruijn graphs

Guillaume Holley gholley at cebitec.uni-bielefeld.de
Mon Nov 16 07:28:12 PST 2015


Dear khmer developers and users,

I am trying to use khmer to build the labeled de Bruijn graph of few 
hundreds bacterial strains represented as reads in FASTQ files. More 
precisely, I am trying to achieve the following tasks:

- Build the label de-bruijn graph.
- Query individual k-mers for their labels.
- Query individual k-mers for the number of successors and predecessors 
they have in the graph.

I spent some time reading the khmer documentation and the blog post 
http://ivory.idyll.org/blog/2015-wok-labelhash.html. Regarding this 
latter one, I have downloaded and installed the code to replicate the 
results but also read the code which gave me a first idea on how to 
achieved some tasks mentioned above.
However, I have few questions, maybe very naive since I am not very 
familiar with khmer.

- Is there any documentation for the API "labelhash"/"graphlabels"? I 
browsed the online documentation of khmer but could not find anything 
about it (maybe I just missed it). Do you know some examples different 
from the experiment mentioned in 
http://ivory.idyll.org/blog/2015-wok-labelhash.html ?

- I saw in the khmer v2.0 annoucement that "labelhash" was renamed 
"graphlabels". However I could not find anything in the khmer code about 
"graphlabels", only "labelhash", so should I continue using "labelhash"?

- the line "lh = khmer.LabelHash(args.ksize, args.tablesize, 
args.n_tables)" is used to build a labelhash, is there a way to specify 
we would like to build the graph from k-mers that occur a minimum of x 
times in their files ?

- In the example given on the blog to replicate the results, a labelhash 
graph is built from concatenated FASTx files. Is there a way to build 
the graph iteratively from FASTx files, without having to concatenate them ?

- I am currently working with k sizes bigger than 32, is there a way to 
work with khmer such that k > 32 ?

Thank your very much in advance for your time and help.

Best regards,

Guillaume Holley, PhD Student
Faculty of Technology, Genome Informatics group, Bielefeld University
Bielefeld, Germany



More information about the khmer mailing list