[khmer] Using khmer for producing k-mer frequency distribution

C. Titus Brown ctb at msu.edu
Tue Aug 27 14:29:45 PDT 2013

Hmm, make sure you've deleted old versions of Khmer. What does 'make test' report in the top Khmer directory?

C. Titus Brown, ctb at msu.edu

On Aug 27, 2013, at 17:27, Rajat Shuvro Roy <rajatroy at cs.rutgers.edu> wrote:

> Thanks so much. I downloaded and compiled the latest version. make test resulted in 'ok' for everything. However, when I tried to run it, I get the following message:
> python load-into-counting.py -k 31 -x 5e10 out.kh 1Mreads.fa 
> Traceback (most recent call last):
>   File "load-into-counting.py", line 13, in <module>
>     from khmer.counting_args import build_construct_args, report_on_config
> ImportError: cannot import name report_on_config
> On Tue, Aug 27, 2013 at 4:41 PM, C. Titus Brown <ctb at msu.edu> wrote:
>> Hi Rajat,
>> sorry for long delay in response!
>> On Thu, Jul 18, 2013 at 03:32:39PM -0400, Rajat Shuvro Roy wrote:
>> > Hello Prof Brown,
>> > I was attempting to produce a k-mer frequency distribution using khmer and
>> > followed the instructions in (
>> > http://khmer.readthedocs.org/en/latest/scripts.html) . I have a Zia mays
>> > library (SRR404240, 95.8Gbp ) and I executed the following command.
>> >
>> > python load-into-counting.py -k 31 -x 5e10 out.kh SRR404240.fasta
>> >
>> > I believe, this counts k-mer frequencies and the script abundance-dist.py
>> > produces the distribution.
>> >
>> > We stopped it after it had ran for 2464 mins (41hrs) using 187GB space. I
>> > tried with smaller values for -x but failed to complete the computation in
>> > less than 3 days. Could you please let us know if this is expected and we
>> > should allow more time. And is there a more efficient way of using Khmer?
>> Your e-mail actually triggered some doc changes and updates ;).
>> Briefly, khmer can count k-mers in either constant-memory mode or in
>> accurate-large-counts mode.  In the former, counts above 255 will
>> stop being counted, but the memory specified with the -N and -x parameters
>> will be the total amount used; in the latter mode (which is the default),
>> counts above 255 will be kept and memory use will expand indefinitely.
>> You can use these modes easily in the latest khmer, the bleeding-edge
>> branch; you can get that like so:
>>         git clone https://github.com/ged-lab/khmer.git -b bleeding-edge
>> Then use 'load-into-counting.py -b' to build the tables, and 'abundance-dist'
>> to generate the output.
>> I'd suggest running it on a small test data set (data/25k.fq.gz, in the
>> khmer repo) just to make sure it all works for you, but it should - we use
>> this regularly.
>> Please let me know if you have any questions, and again, apologies for
>> the delay!
>> cheers,
>> --titus
>> --
>> C. Titus Brown, ctb at msu.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20130827/603de1c9/attachment-0002.htm>

More information about the khmer mailing list