[khmer] Using khmer for producing k-mer frequency distribution

Tue Aug 27 14:27:34 PDT 2013

Thanks so much. I downloaded and compiled the latest version. make test
resulted in 'ok' for everything. However, when I tried to run it, I get the
following message:

python load-into-counting.py -k 31 -x 5e10 out.kh 1Mreads.fa
Traceback (most recent call last):
  File "load-into-counting.py", line 13, in <module>
    from khmer.counting_args import build_construct_args, report_on_config
ImportError: cannot import name report_on_config

On Tue, Aug 27, 2013 at 4:41 PM, C. Titus Brown <ctb at msu.edu> wrote:

> Hi Rajat,
>
> sorry for long delay in response!
>
> On Thu, Jul 18, 2013 at 03:32:39PM -0400, Rajat Shuvro Roy wrote:
> > Hello Prof Brown,
> > I was attempting to produce a k-mer frequency distribution using khmer
> and
> > followed the instructions in (
> > http://khmer.readthedocs.org/en/latest/scripts.html) . I have a Zia mays
> > library (SRR404240, 95.8Gbp ) and I executed the following command.
> >
> > python load-into-counting.py -k 31 -x 5e10 out.kh SRR404240.fasta
> >
> > I believe, this counts k-mer frequencies and the script abundance-dist.py
> > produces the distribution.
> >
> > We stopped it after it had ran for 2464 mins (41hrs) using 187GB space. I
> > tried with smaller values for -x but failed to complete the computation
> in
> > less than 3 days. Could you please let us know if this is expected and we
> > should allow more time. And is there a more efficient way of using Khmer?
>
> Your e-mail actually triggered some doc changes and updates ;).
>
> Briefly, khmer can count k-mers in either constant-memory mode or in
> accurate-large-counts mode.  In the former, counts above 255 will
> stop being counted, but the memory specified with the -N and -x parameters
> will be the total amount used; in the latter mode (which is the default),
> counts above 255 will be kept and memory use will expand indefinitely.
>
> You can use these modes easily in the latest khmer, the bleeding-edge
> branch; you can get that like so:
>
>         git clone https://github.com/ged-lab/khmer.git -b bleeding-edge
>
> Then use 'load-into-counting.py -b' to build the tables, and
> 'abundance-dist'
> to generate the output.
>
> I'd suggest running it on a small test data set (data/25k.fq.gz, in the
> khmer repo) just to make sure it all works for you, but it should - we use
> this regularly.
>
> Please let me know if you have any questions, and again, apologies for
> the delay!
>
> cheers,
> --titus
> --
> C. Titus Brown, ctb at msu.edu
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20130827/99c908f5/attachment-0001.htm>