<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
Hej Titus,<br>
<br>
Thanks for the quick reply! Here are the report files, which are
basically the STDERR and STDOUT output of the scripts.<br>
Quick note before the reports, I made a small mistake in my
openingspost. The Coverage threshold I tried for these reports was
5, not 20.<br>
<br>
Here the report file of the first load-into-counting.py execution
(on the raw sequence data), test.ct.report:<br>
<br>
<font color="#990000">|| This is the script 'load-into-counting.py'
in khmer.<br>
|| You are running khmer version 1.3<br>
|| You are also using screed version 0.8<br>
||<br>
|| If you use this script in a publication, please cite EACH of
the following:<br>
||<br>
|| * MR Crusoe et al., 2014.
<a class="moz-txt-link-freetext" href="http://dx.doi.org/10.6084/m9.figshare.979190">http://dx.doi.org/10.6084/m9.figshare.979190</a><br>
|| * Q Zhang et al.,
<a class="moz-txt-link-freetext" href="http://dx.doi.org/10.1371/journal.pone.0101271">http://dx.doi.org/10.1371/journal.pone.0101271</a><br>
|| * A. D303266ring et al.
<a class="moz-txt-link-freetext" href="http://dx.doi.org:80/10.1186/1471-2105-9-11">http://dx.doi.org:80/10.1186/1471-2105-9-11</a><br>
||<br>
|| Please see
<a class="moz-txt-link-freetext" href="http://khmer.readthedocs.org/en/latest/citations.html">http://khmer.readthedocs.org/en/latest/citations.html</a> for details.<br>
<br>
<br>
PARAMETERS:<br>
- kmer size = 20 (-k)<br>
- n tables = 4 (-N)<br>
- min tablesize = 1.6e+10 (-x)<br>
<br>
Estimated memory usage is 6.4e+10 bytes (n_tables x min_tablesize)<br>
--------<br>
Saving k-mer counting table to test.ct<br>
Loading kmers from sequences in ['test.fastq.gz']<br>
making k-mer counting table<br>
consuming input test.fastq.gz<br>
Total number of unique k-mers: 3102943887<br>
saving test.ct<br>
fp rate estimated to be 0.008<br>
DONE.<br>
wrote to: test.ct.info</font><br>
<br>
Here the report file of the normalize-by-median.py,
test_k20_C5.report<br>
<br>
<font color="#990000">|| This is the script 'normalize-by-median.py'
in khmer.<br>
|| You are running khmer version 1.3<br>
|| You are also using screed version 0.8<br>
||<br>
|| If you use this script in a publication, please cite EACH of
the following:<br>
||<br>
|| * MR Crusoe et al., 2014.
<a class="moz-txt-link-freetext" href="http://dx.doi.org/10.6084/m9.figshare.979190">http://dx.doi.org/10.6084/m9.figshare.979190</a><br>
|| * CT Brown et al., arXiv:1203.4802 [q-bio.GN]<br>
||<br>
|| Please see
<a class="moz-txt-link-freetext" href="http://khmer.readthedocs.org/en/latest/citations.html">http://khmer.readthedocs.org/en/latest/citations.html</a> for details.<br>
<br>
<br>
PARAMETERS:<br>
- kmer size = 20 (-k)<br>
- n tables = 4 (-N)<br>
- min tablesize = 1.6e+10 (-x)<br>
<br>
Estimated memory usage is 6.4e+10 bytes (n_tables x min_tablesize)<br>
--------<br>
... kept 58012 of 200000 or 29%<br>
... in file test.fastq.gz<br>
... kept 116210 of 400000 or 29%<br>
... in file test.fastq.gz<br>
<br>
<font color="#000000">..... etc etc etc .....</font><br>
<br>
... kept 90482098 of 346200000 or 26%<br>
... in file test.fastq.gz<br>
... kept 90529526 of 346400000 or 26%<br>
... in file test.fastq.gz<br>
Total number of unique k-mers: 850221<br>
loading k-mer counting table from test.ct<br>
DONE with test.fastq.gz; kept 90547512 of 346477608 or 26%<br>
output in test_k20_C5.fastq.gz.keep<br>
fp rate estimated to be 0.008</font><br>
<br>
And here the second load-into-counting.py report, test2.ct.report<br>
<br>
<font color="#990000">|| This is the script 'load-into-counting.py'
in khmer.<br>
|| You are running khmer version 1.3<br>
|| You are also using screed version 0.8<br>
||<br>
|| If you use this script in a publication, please cite EACH of
the following:<br>
||<br>
|| * MR Crusoe et al., 2014.
<a class="moz-txt-link-freetext" href="http://dx.doi.org/10.6084/m9.figshare.979190">http://dx.doi.org/10.6084/m9.figshare.979190</a><br>
|| * Q Zhang et al.,
<a class="moz-txt-link-freetext" href="http://dx.doi.org/10.1371/journal.pone.0101271">http://dx.doi.org/10.1371/journal.pone.0101271</a><br>
|| * A. D303266ring et al.
<a class="moz-txt-link-freetext" href="http://dx.doi.org:80/10.1186/1471-2105-9-11">http://dx.doi.org:80/10.1186/1471-2105-9-11</a><br>
||<br>
|| Please see
<a class="moz-txt-link-freetext" href="http://khmer.readthedocs.org/en/latest/citations.html">http://khmer.readthedocs.org/en/latest/citations.html</a> for details.<br>
<br>
<br>
PARAMETERS:<br>
- kmer size = 20 (-k)<br>
- n tables = 4 (-N)<br>
- min tablesize = 1.6e+10 (-x)<br>
<br>
Estimated memory usage is 6.4e+10 bytes (n_tables x min_tablesize)<br>
--------<br>
Saving k-mer counting table to test2.ct<br>
Loading kmers from sequences in ['test_k20_C5.fastq.gz.keep']<br>
making k-mer counting table<br>
consuming input test_k20_C5.fastq.gz.keep<br>
Total number of unique k-mers: 2822473008<br>
saving test2.ct<br>
</font><br>
Hope this helps!<br>
<br>
Joran<br>
<br>
<div class="moz-cite-prefix">On 11/05/15 12:12, C. Titus Brown
wrote:<br>
</div>
<blockquote cite="mid:20150511101252.GA3199@idyll.org" type="cite">
<pre wrap="">On Mon, May 11, 2015 at 11:29:31AM +0200, Joran Martijn wrote:
</pre>
<blockquote type="cite">
<pre wrap="">Dear Khmer mailing list,
I'm trying to compare the number of unique k-mers (lets say 20-mers) in
the raw dataset and diginormed dataset, similar as was done in the
original diginorm paper.
</pre>
</blockquote>
<pre wrap="">
[ elided ]
Hi Joran,
that certainly doesn't sound good :). Would it be possible to convey the
various report files to us, publicly or privately?
thanks,
--titus
p.s. Thank you for the very detailed question!
</pre>
</blockquote>
<br>
</body>
</html>