[khmer] Using khmer for producing k-mer frequency distribution

Tue Aug 27 14:29:45 PDT 2013

Hmm, make sure you've deleted old versions of Khmer. What does 'make test' report in the top Khmer directory?

---
C. Titus Brown, ctb at msu.edu

On Aug 27, 2013, at 17:27, Rajat Shuvro Roy <rajatroy at cs.rutgers.edu> wrote:

> Thanks so much. I downloaded and compiled the latest version. make test resulted in 'ok' for everything. However, when I tried to run it, I get the following message:
> 
> python load-into-counting.py -k 31 -x 5e10 out.kh 1Mreads.fa 
> Traceback (most recent call last):
>   File "load-into-counting.py", line 13, in <module>
>     from khmer.counting_args import build_construct_args, report_on_config
> ImportError: cannot import name report_on_config
> 
> 
> 
> On Tue, Aug 27, 2013 at 4:41 PM, C. Titus Brown <ctb at msu.edu> wrote:
>> Hi Rajat,
>> 
>> sorry for long delay in response!
>> 
>> On Thu, Jul 18, 2013 at 03:32:39PM -0400, Rajat Shuvro Roy wrote:
>> > Hello Prof Brown,
>> > I was attempting to produce a k-mer frequency distribution using khmer and
>> > followed the instructions in (
>> > http://khmer.readthedocs.org/en/latest/scripts.html) . I have a Zia mays
>> > library (SRR404240, 95.8Gbp ) and I executed the following command.
>> >
>> > python load-into-counting.py -k 31 -x 5e10 out.kh SRR404240.fasta
>> >
>> > I believe, this counts k-mer frequencies and the script abundance-dist.py
>> > produces the distribution.
>> >
>> > We stopped it after it had ran for 2464 mins (41hrs) using 187GB space. I
>> > tried with smaller values for -x but failed to complete the computation in
>> > less than 3 days. Could you please let us know if this is expected and we
>> > should allow more time. And is there a more efficient way of using Khmer?
>> 
>> Your e-mail actually triggered some doc changes and updates ;).
>> 
>> Briefly, khmer can count k-mers in either constant-memory mode or in
>> accurate-large-counts mode.  In the former, counts above 255 will
>> stop being counted, but the memory specified with the -N and -x parameters
>> will be the total amount used; in the latter mode (which is the default),
>> counts above 255 will be kept and memory use will expand indefinitely.
>> 
>> You can use these modes easily in the latest khmer, the bleeding-edge
>> branch; you can get that like so:
>> 
>>         git clone https://github.com/ged-lab/khmer.git -b bleeding-edge
>> 
>> Then use 'load-into-counting.py -b' to build the tables, and 'abundance-dist'
>> to generate the output.
>> 
>> I'd suggest running it on a small test data set (data/25k.fq.gz, in the
>> khmer repo) just to make sure it all works for you, but it should - we use
>> this regularly.
>> 
>> Please let me know if you have any questions, and again, apologies for
>> the delay!
>> 
>> cheers,
>> --titus
>> --
>> C. Titus Brown, ctb at msu.edu
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20130827/603de1c9/attachment-0002.htm>