[khmer] Using khmer for producing k-mer frequency distribution

Rajat Shuvro Roy rajatroy at cs.rutgers.edu
Tue Aug 27 14:35:49 PDT 2013


The new version is in a complete new directory. make test gives:

make test
cd lib && \
make
make[1]: Entering directory `/u2/home/rajatroy/khmer/lib'
make[1]: Nothing to be done for `all'.
make[1]: Leaving directory `/u2/home/rajatroy/khmer/lib'
cd python && \
make    DEFINE_KHMER_EXTRA_SANITY_CHECKS="" \
        CXX_DEBUG_FLAGS=""
make[1]: Entering directory `/u2/home/rajatroy/khmer/python'
python setup.py build_ext -i
running build_ext
copying build/lib.linux-x86_64-2.7/khmer/_khmermodule.so -> khmer
make[1]: Leaving directory `/u2/home/rajatroy/khmer/python'
nosetests -v -x -a \!known_failing
tests.test_align.test_alignnocov ... ok
tests.test_align.test_readalign ... ok
tests.test_align.test_alignerrorregion ... ok
tests.test_c_wrapper.test_raise_in_consume_fasta ... ok
tests.test_c_wrapper.test_raise_in_fasta_file_to_minmax ... ok
tests.test_counting_hash.Test_CountingHash.test_collision_1 ... ok
tests.test_counting_hash.Test_CountingHash.test_collision_2 ... ok
tests.test_counting_hash.Test_CountingHash.test_collision_3 ... ok
tests.test_counting_hash.test_3_tables ... ok
tests.test_counting_hash.test_simple_median ... ok
tests.test_counting_hash.test_simple_kadian ... ok
tests.test_counting_hash.test_simple_kadian_2 ... ok
tests.test_counting_hash.test_2_kadian ... ok
tests.test_counting_hash.test_save_load ... ok
tests.test_counting_hash.test_load_gz ... ok
tests.test_counting_hash.test_save_load_gz ... ok
tests.test_counting_hash.test_trim_full ... ok
tests.test_counting_hash.test_trim_short ... ok
tests.test_counting_hash.test_maxcount ... ok
tests.test_counting_hash.test_maxcount_with_bigcount ... ok
tests.test_counting_hash.test_maxcount_with_bigcount_save ... ok
tests.test_counting_hash.test_bigcount_save ... ok
tests.test_counting_hash.test_nobigcount_save ... ok
tests.test_counting_hash.test_bigcount_abund_dist ... ok
tests.test_counting_hash.test_bigcount_abund_dist_2 ... ok
tests.test_counting_hash.test_bigcount_overflow ... ok
tests.test_counting_hash.test_get_ksize ... ok
tests.test_counting_hash.test_get_hashsizes ... ok
tests.test_counting_single.Test_AbundanceDistribution.test_count_A ... ok
tests.test_counting_single.Test_ConsumeString.test_abundance_by_pos ... ok
tests.test_counting_single.Test_ConsumeString.test_abundance_by_pos_bigcount
... ok
tests.test_counting_single.Test_ConsumeString.test_bounded ... ok
tests.test_counting_single.Test_ConsumeString.test_bounded_2 ... ok
tests.test_counting_single.Test_ConsumeString.test_bounded_2_rc ... ok
tests.test_counting_single.Test_ConsumeString.test_bounded_rc ... ok
tests.test_counting_single.Test_ConsumeString.test_max_count ... ok
tests.test_counting_single.Test_ConsumeString.test_max_count_in_bound ... ok
tests.test_counting_single.Test_ConsumeString.test_max_count_out_bound ...
ok
tests.test_counting_single.Test_ConsumeString.test_min_count ... ok
tests.test_counting_single.Test_ConsumeString.test_min_count_in_bound ... ok
tests.test_counting_single.Test_ConsumeString.test_min_count_out_bound ...
ok
tests.test_counting_single.Test_ConsumeString.test_n_occupied ... ok
tests.test_counting_single.Test_ConsumeString.test_n_occupied_args ... ok
tests.test_counting_single.Test_ConsumeString.test_simple ... ok
tests.test_counting_single.Test_ConsumeString.test_simple_2 ... ok
tests.test_counting_single.Test_ConsumeString.test_simple_rc ... ok
tests.test_counting_single.test_no_collision ... ok
tests.test_counting_single.test_collision ... ok
tests.test_counting_single.test_complete_no_collision ... ok
tests.test_counting_single.test_complete_2_collision ... ok
tests.test_counting_single.test_complete_4_collision ... ok
tests.test_counting_single.test_maxcount ... ok
tests.test_counting_single.test_maxcount_with_bigcount ... ok
tests.test_counting_single.test_consume_uniqify_first ... ok
tests.test_counting_single.test_maxcount_consume ... ok
tests.test_counting_single.test_maxcount_consume_with_bigcount ... ok
tests.test_counting_single.test_get_mincount ... ok
tests.test_counting_single.test_get_maxcount ... ok
tests.test_counting_single.test_get_maxcount_rc ... ok
tests.test_counting_single.test_get_mincount_rc ... ok
tests.test_counting_single.test_64bitshift ... ok
tests.test_counting_single.test_64bitshift_2 ... ok
tests.test_counting_single.test_very_short_read ... ok
tests.test_filter.Test_Filter.test_abund ... ok
tests.test_filter.test_filter_sodd ... ok
tests.test_functions.test_forward_hash ... ok
tests.test_functions.test_forward_hash_no_rc ... ok
tests.test_functions.test_reverse_hash ... ok
tests.test_functions.test_get_primes ... ok
tests.test_graph.Test_ExactGraphFu.test_counts ... ok
tests.test_graph.Test_ExactGraphFu.test_graph_links_next_a ... ok
tests.test_graph.Test_ExactGraphFu.test_graph_links_next_c ... ok
tests.test_graph.Test_ExactGraphFu.test_graph_links_next_g ... ok
tests.test_graph.Test_ExactGraphFu.test_graph_links_next_t ... ok
tests.test_graph.Test_ExactGraphFu.test_graph_links_prev_a ... ok
tests.test_graph.Test_ExactGraphFu.test_graph_links_prev_c ... ok
tests.test_graph.Test_ExactGraphFu.test_graph_links_prev_g ... ok
tests.test_graph.Test_ExactGraphFu.test_graph_links_prev_t ... ok
tests.test_graph.Test_InexactGraphFu.test_graph_links_next_a ... ok
tests.test_graph.Test_InexactGraphFu.test_graph_links_next_c ... ok
tests.test_graph.Test_InexactGraphFu.test_graph_links_next_g ... ok
tests.test_graph.Test_InexactGraphFu.test_graph_links_next_t ... ok
tests.test_graph.Test_InexactGraphFu.test_graph_links_prev_a ... ok
tests.test_graph.Test_InexactGraphFu.test_graph_links_prev_c ... ok
tests.test_graph.Test_InexactGraphFu.test_graph_links_prev_g ... ok
tests.test_graph.Test_InexactGraphFu.test_graph_links_prev_t ... ok
tests.test_graph.Test_Partitioning.test_connected_20_a ... ok
tests.test_graph.Test_Partitioning.test_connected_20_b ... ok
tests.test_graph.Test_Partitioning.test_connected_31_c ... ok
tests.test_graph.Test_Partitioning.test_disconnected_20_a ... ok
tests.test_graph.Test_Partitioning.test_disconnected_20_b ... ok
tests.test_graph.Test_Partitioning.test_disconnected_31_c ... ok
tests.test_graph.Test_Partitioning.test_not_output_unassigned ... ok
tests.test_graph.Test_Partitioning.test_output_unassigned ... ok
tests.test_graph.Test_PythonAPI.test_ordered_connect ... ok
tests.test_hashbits.test__get_set_tag_density ... ok
tests.test_hashbits.test_n_occupied_1 ... ok
tests.test_hashbits.test_bloom_python_1 ... ok
tests.test_hashbits.test_bloom_c_1 ... ok
tests.test_hashbits.test_n_occupied_2 ... ok
tests.test_hashbits.test_bloom_c_2 ... ok
tests.test_hashbits.test_filter_if_present ... ok
tests.test_hashbits.test_combine_pe ... ok
tests.test_hashbits.test_load_partitioned ... ok
tests.test_hashbits.test_count_within_radius_simple ... ok
tests.test_hashbits.test_count_within_radius_big ... ok
tests.test_hashbits.test_count_kmer_degree ... ok
tests.test_hashbits.test_find_radius_for_volume ... ok
tests.test_hashbits.test_circumference ... ok
tests.test_hashbits.test_save_load_tagset ... ok
tests.test_hashbits.test_save_load_tagset_noclear ... ok
tests.test_hashbits.test_stop_traverse ... ok
tests.test_hashbits.test_tag_across_stoptraverse ... ok
tests.test_hashbits.test_notag_across_stoptraverse ... ok
tests.test_hashbits.test_find_stoptags ... ok
tests.test_hashbits.test_find_stoptags2 ... ok
tests.test_hashbits.test_get_ksize ... ok
tests.test_hashbits.test_get_hashsizes ... ok
tests.test_hashbits.test_extract_unique_paths_0 ... ok
tests.test_hashbits.test_extract_unique_paths_1 ... ok
tests.test_hashbits.test_extract_unique_paths_2 ... ok
tests.test_hashbits.test_extract_unique_paths_3 ... ok
tests.test_hashbits.test_extract_unique_paths_4 ... ok
tests.test_hashbits.test_find_unpart ... ok
tests.test_hashbits.test_find_unpart_notraverse ... ok
tests.test_hashbits.test_find_unpart_fail ... ok
tests.test_hashbits.test_simple_median ... ok
Verify that 'has_extra_sanity_checks' exists. ... ok
Verify that all of the various attributes exist. ... ok
Verify that all of the various attributes exist. ... ok
Verify that all of the various attributes exist. ... ok
Verify that all of the various attributes exist. ... ok
Verify that the number of threads set is what is reported. ... ok
Verify that the reads file chunk size is what is reported. ... ok
tests.test_ktable.Test_KTable.test_basic ... ok
tests.test_ktable.Test_KTable.test_clear ... ok
tests.test_ktable.Test_KTable.test_consume ... ok
tests.test_ktable.Test_KTable.test_hash ... ok
tests.test_ktable.Test_KTable.test_intersection ... ok
tests.test_ktable.Test_KTable.test_operator_in ... ok
tests.test_ktable.Test_KTable.test_populate ... ok
tests.test_ktable.Test_KTable.test_update ... ok
tests.test_ktable.test_rc ... ok
tests.test_ktable.test_KmerCount ... ok
tests.test_lump.test_fakelump_together ... ok
tests.test_lump.test_fakelump_stop ... ok
tests.test_lump.test_fakelump_stop2 ... ok
tests.test_lump.test_fakelump_repartitioning ... ok
tests.test_minmax.Test_Basic.test_max_1 ... ok
tests.test_minmax.Test_Basic.test_max_2 ... ok
tests.test_minmax.Test_Basic.test_merge_1 ... ok
tests.test_minmax.Test_Basic.test_merge_2 ... ok
tests.test_minmax.Test_Basic.test_merge_3 ... ok
tests.test_minmax.Test_Basic.test_merge_4 ... ok
tests.test_minmax.Test_Basic.test_min_1 ... ok
tests.test_minmax.Test_Basic.test_min_2 ... ok
tests.test_minmax.Test_Basic.test_tablesize ... ok
tests.test_minmax.Test_Filestuff.test_save_no_load ... ok
tests.test_minmax.Test_Filestuff.test_saveload ... ok
tests.test_read_parsers.test_read_properties ... ok
tests.test_read_parsers.test_with_default_arguments ... ok
tests.test_read_parsers.test_gzip_decompression ... ok
tests.test_read_parsers.test_bzip2_decompression ... ok
tests.test_read_parsers.test_with_multiple_threads ... ok
tests.test_read_parsers.test_old_illumina_pair_mating ... ok
tests.test_read_parsers.test_casava_1_8_pair_mating ... ok
tests.test_read_parsers.test_iterator_identities ... ok
tests.test_read_parsers.test_read_pair_iterator_in_error_mode_xfail ... ok
tests.test_scripts.test_load_into_counting ... ok
tests.test_scripts.test_load_into_counting_fail ... ok
tests.test_scripts.test_filter_abund_1 ... ok
tests.test_scripts.test_filter_abund_2 ... ok
tests.test_scripts.test_filter_abund_3_fq_retained ... ok
tests.test_scripts.test_filter_abund_1_singlefile ... ok
tests.test_scripts.test_filter_abund_4_retain_low_abund ... ok
tests.test_scripts.test_filter_abund_5_trim_high_abund ... ok
tests.test_scripts.test_filter_abund_6_trim_high_abund_Z ... ok
tests.test_scripts.test_filter_stoptags ... ok
tests.test_scripts.test_normalize_by_median ... ok
tests.test_scripts.test_normalize_by_median_2 ... ok
tests.test_scripts.test_normalize_by_median_paired ... ok
tests.test_scripts.test_normalize_by_median_impaired ... ok
tests.test_scripts.test_normalize_by_median_force ... ok
tests.test_scripts.test_normalize_by_median_dumpfrequency ... ok
tests.test_scripts.test_normalize_by_median_empty ... ok
tests.test_scripts.test_count_median ... ok
tests.test_scripts.test_load_graph ... ok
tests.test_scripts.test_load_graph_no_tags ... ok
tests.test_scripts.test_load_graph_fail ... ok
tests.test_scripts.test_partition_graph_1 ... ok
tests.test_scripts.test_partition_graph_nojoin_k21 ... ok
tests.test_scripts.test_partition_graph_nojoin_stoptags ... ok
tests.test_scripts.test_partition_graph_big_traverse ... ok
tests.test_scripts.test_partition_graph_no_big_traverse ... ok
tests.test_scripts.test_annotate_partitions ... ok
tests.test_scripts.test_annotate_partitions_2 ... ok
tests.test_scripts.test_extract_partitions ... ok
tests.test_scripts.test_abundance_dist ... ok
tests.test_scripts.test_abundance_dist_nobigcount ... ok
tests.test_scripts.test_abundance_dist_single ... ok
tests.test_scripts.test_abundance_dist_single_nobigcount ... ok
tests.test_scripts.test_do_partition ... ok
tests.test_scripts.test_do_partition_2 ... ok
tests.test_scripts.test_interleave_reads_1_fq ... ok
tests.test_scripts.test_interleave_reads_2_fa ... ok
tests.test_scripts.test_extract_paired_reads_1_fa ... ok
tests.test_scripts.test_extract_paired_reads_2_fq ... ok
tests.test_scripts.test_split_paired_reads_1_fa ... ok
tests.test_scripts.test_split_paired_reads_2_fq ... ok
tests.test_split.test_2_split ... ok
tests.test_split.test_n_split ... ok
tests.test_split.test_n3_split ... ok
tests.test_subset_graph.Test_RandomData.test_3_merge_013 ... ok
tests.test_subset_graph.Test_RandomData.test_3_merge_023 ... ok
tests.test_subset_graph.Test_RandomData.test_5_merge_046 ... ok
tests.test_subset_graph.Test_RandomData.test_random_20_a_succ ... ok
tests.test_subset_graph.Test_RandomData.test_random_20_a_succ_II ... ok
tests.test_subset_graph.Test_RandomData.test_random_20_a_succ_III ... ok
tests.test_subset_graph.Test_RandomData.test_random_20_a_succ_IV ... ok
tests.test_subset_graph.Test_RandomData.test_random_20_a_succ_IV_save ... ok
tests.test_subset_graph.Test_SaveLoadPmap.test_save_load_merge ... ok
tests.test_subset_graph.Test_SaveLoadPmap.test_save_load_merge_2 ... ok
tests.test_subset_graph.Test_SaveLoadPmap.test_save_merge_from_disk ... ok
tests.test_subset_graph.Test_SaveLoadPmap.test_save_merge_from_disk_2 ... ok
tests.test_subset_graph.test_output_partitions ... ok
tests.test_subset_graph.test_tiny_real_partitions ... ok
tests.test_subset_graph.test_small_real_partitions ... ok
tests.test_threaded_sequence_processor.test_basic ... ok
tests.test_threaded_sequence_processor.test_basic_fastq_like ... ok
tests.test_threaded_sequence_processor.test_odd ... ok
tests.test_threaded_sequence_processor.test_basic_2thread ... ok
tests.test_threaded_sequence_processor.test_paired_2thread ... ok
tests.test_threaded_sequence_processor.test_paired_2thread_more_seq ... ok

----------------------------------------------------------------------
Ran 233 tests in 20.632s

OK



On Tue, Aug 27, 2013 at 5:29 PM, C. Titus Brown <ctb at msu.edu> wrote:

> Hmm, make sure you've deleted old versions of Khmer. What does 'make test'
> report in the top Khmer directory?
>
> ---
> C. Titus Brown, ctb at msu.edu
>
> On Aug 27, 2013, at 17:27, Rajat Shuvro Roy <rajatroy at cs.rutgers.edu>
> wrote:
>
> Thanks so much. I downloaded and compiled the latest version. make test
> resulted in 'ok' for everything. However, when I tried to run it, I get the
> following message:
>
> python load-into-counting.py -k 31 -x 5e10 out.kh 1Mreads.fa
> Traceback (most recent call last):
>   File "load-into-counting.py", line 13, in <module>
>     from khmer.counting_args import build_construct_args, report_on_config
> ImportError: cannot import name report_on_config
>
>
>
> On Tue, Aug 27, 2013 at 4:41 PM, C. Titus Brown <ctb at msu.edu> wrote:
>
>> Hi Rajat,
>>
>> sorry for long delay in response!
>>
>> On Thu, Jul 18, 2013 at 03:32:39PM -0400, Rajat Shuvro Roy wrote:
>> > Hello Prof Brown,
>> > I was attempting to produce a k-mer frequency distribution using khmer
>> and
>> > followed the instructions in (
>> > http://khmer.readthedocs.org/en/latest/scripts.html) . I have a Zia
>> mays
>> > library (SRR404240, 95.8Gbp ) and I executed the following command.
>> >
>> > python load-into-counting.py -k 31 -x 5e10 out.kh SRR404240.fasta
>> >
>> > I believe, this counts k-mer frequencies and the script
>> abundance-dist.py
>> > produces the distribution.
>> >
>> > We stopped it after it had ran for 2464 mins (41hrs) using 187GB space.
>> I
>> > tried with smaller values for -x but failed to complete the computation
>> in
>> > less than 3 days. Could you please let us know if this is expected and
>> we
>> > should allow more time. And is there a more efficient way of using
>> Khmer?
>>
>> Your e-mail actually triggered some doc changes and updates ;).
>>
>> Briefly, khmer can count k-mers in either constant-memory mode or in
>> accurate-large-counts mode.  In the former, counts above 255 will
>> stop being counted, but the memory specified with the -N and -x parameters
>> will be the total amount used; in the latter mode (which is the default),
>> counts above 255 will be kept and memory use will expand indefinitely.
>>
>> You can use these modes easily in the latest khmer, the bleeding-edge
>> branch; you can get that like so:
>>
>>         git clone https://github.com/ged-lab/khmer.git -b bleeding-edge
>>
>> Then use 'load-into-counting.py -b' to build the tables, and
>> 'abundance-dist'
>> to generate the output.
>>
>> I'd suggest running it on a small test data set (data/25k.fq.gz, in the
>> khmer repo) just to make sure it all works for you, but it should - we use
>> this regularly.
>>
>> Please let me know if you have any questions, and again, apologies for
>> the delay!
>>
>> cheers,
>> --titus
>> --
>> C. Titus Brown, ctb at msu.edu
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20130827/d975c97c/attachment-0002.htm>


More information about the khmer mailing list