[khmer] inconsistent unique k-mer counting

Joran Martijn joran.martijn at icm.uu.se
Mon May 11 04:06:24 PDT 2015


Hej Titus,

Thanks for the quick reply! Here are the report files, which are 
basically the STDERR and STDOUT output of the scripts.
Quick note before the reports, I made a small mistake in my 
openingspost. The Coverage threshold I tried for these reports was 5, 
not 20.

Here the report file of the first load-into-counting.py execution (on 
the raw sequence data), test.ct.report:

|| This is the script 'load-into-counting.py' in khmer.
|| You are running khmer version 1.3
|| You are also using screed version 0.8
||
|| If you use this script in a publication, please cite EACH of the 
following:
||
||   * MR Crusoe et al., 2014. http://dx.doi.org/10.6084/m9.figshare.979190
||   * Q Zhang et al., http://dx.doi.org/10.1371/journal.pone.0101271
||   * A. D303266ring et al. http://dx.doi.org:80/10.1186/1471-2105-9-11
||
|| Please see http://khmer.readthedocs.org/en/latest/citations.html for 
details.


PARAMETERS:
  - kmer size =    20            (-k)
  - n tables =     4             (-N)
  - min tablesize = 1.6e+10      (-x)

Estimated memory usage is 6.4e+10 bytes (n_tables x min_tablesize)
--------
Saving k-mer counting table to test.ct
Loading kmers from sequences in ['test.fastq.gz']
making k-mer counting table
consuming input test.fastq.gz
Total number of unique k-mers: 3102943887
saving test.ct
fp rate estimated to be 0.008
DONE.
wrote to: test.ct.info

Here the report file of the normalize-by-median.py, test_k20_C5.report

|| This is the script 'normalize-by-median.py' in khmer.
|| You are running khmer version 1.3
|| You are also using screed version 0.8
||
|| If you use this script in a publication, please cite EACH of the 
following:
||
||   * MR Crusoe et al., 2014. http://dx.doi.org/10.6084/m9.figshare.979190
||   * CT Brown et al., arXiv:1203.4802 [q-bio.GN]
||
|| Please see http://khmer.readthedocs.org/en/latest/citations.html for 
details.


PARAMETERS:
  - kmer size =    20            (-k)
  - n tables =     4             (-N)
  - min tablesize = 1.6e+10      (-x)

Estimated memory usage is 6.4e+10 bytes (n_tables x min_tablesize)
--------
... kept 58012 of 200000 or 29%
... in file test.fastq.gz
... kept 116210 of 400000 or 29%
... in file test.fastq.gz

..... etc etc etc .....

... kept 90482098 of 346200000 or 26%
... in file test.fastq.gz
... kept 90529526 of 346400000 or 26%
... in file test.fastq.gz
Total number of unique k-mers: 850221
loading k-mer counting table from test.ct
DONE with test.fastq.gz; kept 90547512 of 346477608 or 26%
output in test_k20_C5.fastq.gz.keep
fp rate estimated to be 0.008

And here the second load-into-counting.py report, test2.ct.report

|| This is the script 'load-into-counting.py' in khmer.
|| You are running khmer version 1.3
|| You are also using screed version 0.8
||
|| If you use this script in a publication, please cite EACH of the 
following:
||
||   * MR Crusoe et al., 2014. http://dx.doi.org/10.6084/m9.figshare.979190
||   * Q Zhang et al., http://dx.doi.org/10.1371/journal.pone.0101271
||   * A. D303266ring et al. http://dx.doi.org:80/10.1186/1471-2105-9-11
||
|| Please see http://khmer.readthedocs.org/en/latest/citations.html for 
details.


PARAMETERS:
  - kmer size =    20            (-k)
  - n tables =     4             (-N)
  - min tablesize = 1.6e+10      (-x)

Estimated memory usage is 6.4e+10 bytes (n_tables x min_tablesize)
--------
Saving k-mer counting table to test2.ct
Loading kmers from sequences in ['test_k20_C5.fastq.gz.keep']
making k-mer counting table
consuming input test_k20_C5.fastq.gz.keep
Total number of unique k-mers: 2822473008
saving test2.ct

Hope this helps!

Joran

On 11/05/15 12:12, C. Titus Brown wrote:
> On Mon, May 11, 2015 at 11:29:31AM +0200, Joran Martijn wrote:
>> Dear Khmer mailing list,
>>
>> I'm trying to compare the number of unique k-mers (lets say 20-mers) in
>> the raw dataset and diginormed dataset, similar as was done in the
>> original diginorm paper.
> [ elided ]
>
> Hi Joran,
>
> that certainly doesn't sound good :). Would it be possible to convey the
> various report files to us, publicly or privately?
>
> thanks,
> --titus
>
> p.s. Thank you for the very detailed question!

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20150511/96dd402e/attachment.htm>


More information about the khmer mailing list