[khmer] Using khmer for producing k-mer frequency distribution

C. Titus Brown ctb at msu.edu
Tue Aug 27 18:51:27 PDT 2013


On Tue, Aug 27, 2013 at 05:35:49PM -0400, Rajat Shuvro Roy wrote:
> The new version is in a complete new directory. make test gives:

OK, all the tests pass, including the ones that run normalize-by-median. In
that case it's almost certainly a problem with your PYTHONPATH -- make sure
it points to the new directory's 'python' subdirectory.

Do:

% python -c "import khmer; print khmer"

to see where khmer is being imported from -- it should be the new location.

And yes, fixing installation is in the near future :)

cheers,
--titus

> 
> make test
> cd lib && \
> make
> make[1]: Entering directory `/u2/home/rajatroy/khmer/lib'
> make[1]: Nothing to be done for `all'.
> make[1]: Leaving directory `/u2/home/rajatroy/khmer/lib'
> cd python && \
> make    DEFINE_KHMER_EXTRA_SANITY_CHECKS="" \
>         CXX_DEBUG_FLAGS=""
> make[1]: Entering directory `/u2/home/rajatroy/khmer/python'
> python setup.py build_ext -i
> running build_ext
> copying build/lib.linux-x86_64-2.7/khmer/_khmermodule.so -> khmer
> make[1]: Leaving directory `/u2/home/rajatroy/khmer/python'
> nosetests -v -x -a \!known_failing
> tests.test_align.test_alignnocov ... ok
> tests.test_align.test_readalign ... ok
> tests.test_align.test_alignerrorregion ... ok
> tests.test_c_wrapper.test_raise_in_consume_fasta ... ok
> tests.test_c_wrapper.test_raise_in_fasta_file_to_minmax ... ok
> tests.test_counting_hash.Test_CountingHash.test_collision_1 ... ok
> tests.test_counting_hash.Test_CountingHash.test_collision_2 ... ok
> tests.test_counting_hash.Test_CountingHash.test_collision_3 ... ok
> tests.test_counting_hash.test_3_tables ... ok
> tests.test_counting_hash.test_simple_median ... ok
> tests.test_counting_hash.test_simple_kadian ... ok
> tests.test_counting_hash.test_simple_kadian_2 ... ok
> tests.test_counting_hash.test_2_kadian ... ok
> tests.test_counting_hash.test_save_load ... ok
> tests.test_counting_hash.test_load_gz ... ok
> tests.test_counting_hash.test_save_load_gz ... ok
> tests.test_counting_hash.test_trim_full ... ok
> tests.test_counting_hash.test_trim_short ... ok
> tests.test_counting_hash.test_maxcount ... ok
> tests.test_counting_hash.test_maxcount_with_bigcount ... ok
> tests.test_counting_hash.test_maxcount_with_bigcount_save ... ok
> tests.test_counting_hash.test_bigcount_save ... ok
> tests.test_counting_hash.test_nobigcount_save ... ok
> tests.test_counting_hash.test_bigcount_abund_dist ... ok
> tests.test_counting_hash.test_bigcount_abund_dist_2 ... ok
> tests.test_counting_hash.test_bigcount_overflow ... ok
> tests.test_counting_hash.test_get_ksize ... ok
> tests.test_counting_hash.test_get_hashsizes ... ok
> tests.test_counting_single.Test_AbundanceDistribution.test_count_A ... ok
> tests.test_counting_single.Test_ConsumeString.test_abundance_by_pos ... ok
> tests.test_counting_single.Test_ConsumeString.test_abundance_by_pos_bigcount
> ... ok
> tests.test_counting_single.Test_ConsumeString.test_bounded ... ok
> tests.test_counting_single.Test_ConsumeString.test_bounded_2 ... ok
> tests.test_counting_single.Test_ConsumeString.test_bounded_2_rc ... ok
> tests.test_counting_single.Test_ConsumeString.test_bounded_rc ... ok
> tests.test_counting_single.Test_ConsumeString.test_max_count ... ok
> tests.test_counting_single.Test_ConsumeString.test_max_count_in_bound ... ok
> tests.test_counting_single.Test_ConsumeString.test_max_count_out_bound ...
> ok
> tests.test_counting_single.Test_ConsumeString.test_min_count ... ok
> tests.test_counting_single.Test_ConsumeString.test_min_count_in_bound ... ok
> tests.test_counting_single.Test_ConsumeString.test_min_count_out_bound ...
> ok
> tests.test_counting_single.Test_ConsumeString.test_n_occupied ... ok
> tests.test_counting_single.Test_ConsumeString.test_n_occupied_args ... ok
> tests.test_counting_single.Test_ConsumeString.test_simple ... ok
> tests.test_counting_single.Test_ConsumeString.test_simple_2 ... ok
> tests.test_counting_single.Test_ConsumeString.test_simple_rc ... ok
> tests.test_counting_single.test_no_collision ... ok
> tests.test_counting_single.test_collision ... ok
> tests.test_counting_single.test_complete_no_collision ... ok
> tests.test_counting_single.test_complete_2_collision ... ok
> tests.test_counting_single.test_complete_4_collision ... ok
> tests.test_counting_single.test_maxcount ... ok
> tests.test_counting_single.test_maxcount_with_bigcount ... ok
> tests.test_counting_single.test_consume_uniqify_first ... ok
> tests.test_counting_single.test_maxcount_consume ... ok
> tests.test_counting_single.test_maxcount_consume_with_bigcount ... ok
> tests.test_counting_single.test_get_mincount ... ok
> tests.test_counting_single.test_get_maxcount ... ok
> tests.test_counting_single.test_get_maxcount_rc ... ok
> tests.test_counting_single.test_get_mincount_rc ... ok
> tests.test_counting_single.test_64bitshift ... ok
> tests.test_counting_single.test_64bitshift_2 ... ok
> tests.test_counting_single.test_very_short_read ... ok
> tests.test_filter.Test_Filter.test_abund ... ok
> tests.test_filter.test_filter_sodd ... ok
> tests.test_functions.test_forward_hash ... ok
> tests.test_functions.test_forward_hash_no_rc ... ok
> tests.test_functions.test_reverse_hash ... ok
> tests.test_functions.test_get_primes ... ok
> tests.test_graph.Test_ExactGraphFu.test_counts ... ok
> tests.test_graph.Test_ExactGraphFu.test_graph_links_next_a ... ok
> tests.test_graph.Test_ExactGraphFu.test_graph_links_next_c ... ok
> tests.test_graph.Test_ExactGraphFu.test_graph_links_next_g ... ok
> tests.test_graph.Test_ExactGraphFu.test_graph_links_next_t ... ok
> tests.test_graph.Test_ExactGraphFu.test_graph_links_prev_a ... ok
> tests.test_graph.Test_ExactGraphFu.test_graph_links_prev_c ... ok
> tests.test_graph.Test_ExactGraphFu.test_graph_links_prev_g ... ok
> tests.test_graph.Test_ExactGraphFu.test_graph_links_prev_t ... ok
> tests.test_graph.Test_InexactGraphFu.test_graph_links_next_a ... ok
> tests.test_graph.Test_InexactGraphFu.test_graph_links_next_c ... ok
> tests.test_graph.Test_InexactGraphFu.test_graph_links_next_g ... ok
> tests.test_graph.Test_InexactGraphFu.test_graph_links_next_t ... ok
> tests.test_graph.Test_InexactGraphFu.test_graph_links_prev_a ... ok
> tests.test_graph.Test_InexactGraphFu.test_graph_links_prev_c ... ok
> tests.test_graph.Test_InexactGraphFu.test_graph_links_prev_g ... ok
> tests.test_graph.Test_InexactGraphFu.test_graph_links_prev_t ... ok
> tests.test_graph.Test_Partitioning.test_connected_20_a ... ok
> tests.test_graph.Test_Partitioning.test_connected_20_b ... ok
> tests.test_graph.Test_Partitioning.test_connected_31_c ... ok
> tests.test_graph.Test_Partitioning.test_disconnected_20_a ... ok
> tests.test_graph.Test_Partitioning.test_disconnected_20_b ... ok
> tests.test_graph.Test_Partitioning.test_disconnected_31_c ... ok
> tests.test_graph.Test_Partitioning.test_not_output_unassigned ... ok
> tests.test_graph.Test_Partitioning.test_output_unassigned ... ok
> tests.test_graph.Test_PythonAPI.test_ordered_connect ... ok
> tests.test_hashbits.test__get_set_tag_density ... ok
> tests.test_hashbits.test_n_occupied_1 ... ok
> tests.test_hashbits.test_bloom_python_1 ... ok
> tests.test_hashbits.test_bloom_c_1 ... ok
> tests.test_hashbits.test_n_occupied_2 ... ok
> tests.test_hashbits.test_bloom_c_2 ... ok
> tests.test_hashbits.test_filter_if_present ... ok
> tests.test_hashbits.test_combine_pe ... ok
> tests.test_hashbits.test_load_partitioned ... ok
> tests.test_hashbits.test_count_within_radius_simple ... ok
> tests.test_hashbits.test_count_within_radius_big ... ok
> tests.test_hashbits.test_count_kmer_degree ... ok
> tests.test_hashbits.test_find_radius_for_volume ... ok
> tests.test_hashbits.test_circumference ... ok
> tests.test_hashbits.test_save_load_tagset ... ok
> tests.test_hashbits.test_save_load_tagset_noclear ... ok
> tests.test_hashbits.test_stop_traverse ... ok
> tests.test_hashbits.test_tag_across_stoptraverse ... ok
> tests.test_hashbits.test_notag_across_stoptraverse ... ok
> tests.test_hashbits.test_find_stoptags ... ok
> tests.test_hashbits.test_find_stoptags2 ... ok
> tests.test_hashbits.test_get_ksize ... ok
> tests.test_hashbits.test_get_hashsizes ... ok
> tests.test_hashbits.test_extract_unique_paths_0 ... ok
> tests.test_hashbits.test_extract_unique_paths_1 ... ok
> tests.test_hashbits.test_extract_unique_paths_2 ... ok
> tests.test_hashbits.test_extract_unique_paths_3 ... ok
> tests.test_hashbits.test_extract_unique_paths_4 ... ok
> tests.test_hashbits.test_find_unpart ... ok
> tests.test_hashbits.test_find_unpart_notraverse ... ok
> tests.test_hashbits.test_find_unpart_fail ... ok
> tests.test_hashbits.test_simple_median ... ok
> Verify that 'has_extra_sanity_checks' exists. ... ok
> Verify that all of the various attributes exist. ... ok
> Verify that all of the various attributes exist. ... ok
> Verify that all of the various attributes exist. ... ok
> Verify that all of the various attributes exist. ... ok
> Verify that the number of threads set is what is reported. ... ok
> Verify that the reads file chunk size is what is reported. ... ok
> tests.test_ktable.Test_KTable.test_basic ... ok
> tests.test_ktable.Test_KTable.test_clear ... ok
> tests.test_ktable.Test_KTable.test_consume ... ok
> tests.test_ktable.Test_KTable.test_hash ... ok
> tests.test_ktable.Test_KTable.test_intersection ... ok
> tests.test_ktable.Test_KTable.test_operator_in ... ok
> tests.test_ktable.Test_KTable.test_populate ... ok
> tests.test_ktable.Test_KTable.test_update ... ok
> tests.test_ktable.test_rc ... ok
> tests.test_ktable.test_KmerCount ... ok
> tests.test_lump.test_fakelump_together ... ok
> tests.test_lump.test_fakelump_stop ... ok
> tests.test_lump.test_fakelump_stop2 ... ok
> tests.test_lump.test_fakelump_repartitioning ... ok
> tests.test_minmax.Test_Basic.test_max_1 ... ok
> tests.test_minmax.Test_Basic.test_max_2 ... ok
> tests.test_minmax.Test_Basic.test_merge_1 ... ok
> tests.test_minmax.Test_Basic.test_merge_2 ... ok
> tests.test_minmax.Test_Basic.test_merge_3 ... ok
> tests.test_minmax.Test_Basic.test_merge_4 ... ok
> tests.test_minmax.Test_Basic.test_min_1 ... ok
> tests.test_minmax.Test_Basic.test_min_2 ... ok
> tests.test_minmax.Test_Basic.test_tablesize ... ok
> tests.test_minmax.Test_Filestuff.test_save_no_load ... ok
> tests.test_minmax.Test_Filestuff.test_saveload ... ok
> tests.test_read_parsers.test_read_properties ... ok
> tests.test_read_parsers.test_with_default_arguments ... ok
> tests.test_read_parsers.test_gzip_decompression ... ok
> tests.test_read_parsers.test_bzip2_decompression ... ok
> tests.test_read_parsers.test_with_multiple_threads ... ok
> tests.test_read_parsers.test_old_illumina_pair_mating ... ok
> tests.test_read_parsers.test_casava_1_8_pair_mating ... ok
> tests.test_read_parsers.test_iterator_identities ... ok
> tests.test_read_parsers.test_read_pair_iterator_in_error_mode_xfail ... ok
> tests.test_scripts.test_load_into_counting ... ok
> tests.test_scripts.test_load_into_counting_fail ... ok
> tests.test_scripts.test_filter_abund_1 ... ok
> tests.test_scripts.test_filter_abund_2 ... ok
> tests.test_scripts.test_filter_abund_3_fq_retained ... ok
> tests.test_scripts.test_filter_abund_1_singlefile ... ok
> tests.test_scripts.test_filter_abund_4_retain_low_abund ... ok
> tests.test_scripts.test_filter_abund_5_trim_high_abund ... ok
> tests.test_scripts.test_filter_abund_6_trim_high_abund_Z ... ok
> tests.test_scripts.test_filter_stoptags ... ok
> tests.test_scripts.test_normalize_by_median ... ok
> tests.test_scripts.test_normalize_by_median_2 ... ok
> tests.test_scripts.test_normalize_by_median_paired ... ok
> tests.test_scripts.test_normalize_by_median_impaired ... ok
> tests.test_scripts.test_normalize_by_median_force ... ok
> tests.test_scripts.test_normalize_by_median_dumpfrequency ... ok
> tests.test_scripts.test_normalize_by_median_empty ... ok
> tests.test_scripts.test_count_median ... ok
> tests.test_scripts.test_load_graph ... ok
> tests.test_scripts.test_load_graph_no_tags ... ok
> tests.test_scripts.test_load_graph_fail ... ok
> tests.test_scripts.test_partition_graph_1 ... ok
> tests.test_scripts.test_partition_graph_nojoin_k21 ... ok
> tests.test_scripts.test_partition_graph_nojoin_stoptags ... ok
> tests.test_scripts.test_partition_graph_big_traverse ... ok
> tests.test_scripts.test_partition_graph_no_big_traverse ... ok
> tests.test_scripts.test_annotate_partitions ... ok
> tests.test_scripts.test_annotate_partitions_2 ... ok
> tests.test_scripts.test_extract_partitions ... ok
> tests.test_scripts.test_abundance_dist ... ok
> tests.test_scripts.test_abundance_dist_nobigcount ... ok
> tests.test_scripts.test_abundance_dist_single ... ok
> tests.test_scripts.test_abundance_dist_single_nobigcount ... ok
> tests.test_scripts.test_do_partition ... ok
> tests.test_scripts.test_do_partition_2 ... ok
> tests.test_scripts.test_interleave_reads_1_fq ... ok
> tests.test_scripts.test_interleave_reads_2_fa ... ok
> tests.test_scripts.test_extract_paired_reads_1_fa ... ok
> tests.test_scripts.test_extract_paired_reads_2_fq ... ok
> tests.test_scripts.test_split_paired_reads_1_fa ... ok
> tests.test_scripts.test_split_paired_reads_2_fq ... ok
> tests.test_split.test_2_split ... ok
> tests.test_split.test_n_split ... ok
> tests.test_split.test_n3_split ... ok
> tests.test_subset_graph.Test_RandomData.test_3_merge_013 ... ok
> tests.test_subset_graph.Test_RandomData.test_3_merge_023 ... ok
> tests.test_subset_graph.Test_RandomData.test_5_merge_046 ... ok
> tests.test_subset_graph.Test_RandomData.test_random_20_a_succ ... ok
> tests.test_subset_graph.Test_RandomData.test_random_20_a_succ_II ... ok
> tests.test_subset_graph.Test_RandomData.test_random_20_a_succ_III ... ok
> tests.test_subset_graph.Test_RandomData.test_random_20_a_succ_IV ... ok
> tests.test_subset_graph.Test_RandomData.test_random_20_a_succ_IV_save ... ok
> tests.test_subset_graph.Test_SaveLoadPmap.test_save_load_merge ... ok
> tests.test_subset_graph.Test_SaveLoadPmap.test_save_load_merge_2 ... ok
> tests.test_subset_graph.Test_SaveLoadPmap.test_save_merge_from_disk ... ok
> tests.test_subset_graph.Test_SaveLoadPmap.test_save_merge_from_disk_2 ... ok
> tests.test_subset_graph.test_output_partitions ... ok
> tests.test_subset_graph.test_tiny_real_partitions ... ok
> tests.test_subset_graph.test_small_real_partitions ... ok
> tests.test_threaded_sequence_processor.test_basic ... ok
> tests.test_threaded_sequence_processor.test_basic_fastq_like ... ok
> tests.test_threaded_sequence_processor.test_odd ... ok
> tests.test_threaded_sequence_processor.test_basic_2thread ... ok
> tests.test_threaded_sequence_processor.test_paired_2thread ... ok
> tests.test_threaded_sequence_processor.test_paired_2thread_more_seq ... ok
> 
> ----------------------------------------------------------------------
> Ran 233 tests in 20.632s
> 
> OK
> 
> 
> 
> On Tue, Aug 27, 2013 at 5:29 PM, C. Titus Brown <ctb at msu.edu> wrote:
> 
> > Hmm, make sure you've deleted old versions of Khmer. What does 'make test'
> > report in the top Khmer directory?
> >
> > ---
> > C. Titus Brown, ctb at msu.edu
> >
> > On Aug 27, 2013, at 17:27, Rajat Shuvro Roy <rajatroy at cs.rutgers.edu>
> > wrote:
> >
> > Thanks so much. I downloaded and compiled the latest version. make test
> > resulted in 'ok' for everything. However, when I tried to run it, I get the
> > following message:
> >
> > python load-into-counting.py -k 31 -x 5e10 out.kh 1Mreads.fa
> > Traceback (most recent call last):
> >   File "load-into-counting.py", line 13, in <module>
> >     from khmer.counting_args import build_construct_args, report_on_config
> > ImportError: cannot import name report_on_config
> >
> >
> >
> > On Tue, Aug 27, 2013 at 4:41 PM, C. Titus Brown <ctb at msu.edu> wrote:
> >
> >> Hi Rajat,
> >>
> >> sorry for long delay in response!
> >>
> >> On Thu, Jul 18, 2013 at 03:32:39PM -0400, Rajat Shuvro Roy wrote:
> >> > Hello Prof Brown,
> >> > I was attempting to produce a k-mer frequency distribution using khmer
> >> and
> >> > followed the instructions in (
> >> > http://khmer.readthedocs.org/en/latest/scripts.html) . I have a Zia
> >> mays
> >> > library (SRR404240, 95.8Gbp ) and I executed the following command.
> >> >
> >> > python load-into-counting.py -k 31 -x 5e10 out.kh SRR404240.fasta
> >> >
> >> > I believe, this counts k-mer frequencies and the script
> >> abundance-dist.py
> >> > produces the distribution.
> >> >
> >> > We stopped it after it had ran for 2464 mins (41hrs) using 187GB space.
> >> I
> >> > tried with smaller values for -x but failed to complete the computation
> >> in
> >> > less than 3 days. Could you please let us know if this is expected and
> >> we
> >> > should allow more time. And is there a more efficient way of using
> >> Khmer?
> >>
> >> Your e-mail actually triggered some doc changes and updates ;).
> >>
> >> Briefly, khmer can count k-mers in either constant-memory mode or in
> >> accurate-large-counts mode.  In the former, counts above 255 will
> >> stop being counted, but the memory specified with the -N and -x parameters
> >> will be the total amount used; in the latter mode (which is the default),
> >> counts above 255 will be kept and memory use will expand indefinitely.
> >>
> >> You can use these modes easily in the latest khmer, the bleeding-edge
> >> branch; you can get that like so:
> >>
> >>         git clone https://github.com/ged-lab/khmer.git -b bleeding-edge
> >>
> >> Then use 'load-into-counting.py -b' to build the tables, and
> >> 'abundance-dist'
> >> to generate the output.
> >>
> >> I'd suggest running it on a small test data set (data/25k.fq.gz, in the
> >> khmer repo) just to make sure it all works for you, but it should - we use
> >> this regularly.
> >>
> >> Please let me know if you have any questions, and again, apologies for
> >> the delay!
> >>
> >> cheers,
> >> --titus
> >> --
> >> C. Titus Brown, ctb at msu.edu
> >>
> >
> >

-- 
C. Titus Brown, ctb at msu.edu




More information about the khmer mailing list