[khmer] inconsistent unique k-mer counting
Joran Martijn
joran.martijn at icm.uu.se
Mon May 11 04:06:24 PDT 2015
Hej Titus,
Thanks for the quick reply! Here are the report files, which are
basically the STDERR and STDOUT output of the scripts.
Quick note before the reports, I made a small mistake in my
openingspost. The Coverage threshold I tried for these reports was 5,
not 20.
Here the report file of the first load-into-counting.py execution (on
the raw sequence data), test.ct.report:
|| This is the script 'load-into-counting.py' in khmer.
|| You are running khmer version 1.3
|| You are also using screed version 0.8
||
|| If you use this script in a publication, please cite EACH of the
following:
||
|| * MR Crusoe et al., 2014. http://dx.doi.org/10.6084/m9.figshare.979190
|| * Q Zhang et al., http://dx.doi.org/10.1371/journal.pone.0101271
|| * A. D303266ring et al. http://dx.doi.org:80/10.1186/1471-2105-9-11
||
|| Please see http://khmer.readthedocs.org/en/latest/citations.html for
details.
PARAMETERS:
- kmer size = 20 (-k)
- n tables = 4 (-N)
- min tablesize = 1.6e+10 (-x)
Estimated memory usage is 6.4e+10 bytes (n_tables x min_tablesize)
--------
Saving k-mer counting table to test.ct
Loading kmers from sequences in ['test.fastq.gz']
making k-mer counting table
consuming input test.fastq.gz
Total number of unique k-mers: 3102943887
saving test.ct
fp rate estimated to be 0.008
DONE.
wrote to: test.ct.info
Here the report file of the normalize-by-median.py, test_k20_C5.report
|| This is the script 'normalize-by-median.py' in khmer.
|| You are running khmer version 1.3
|| You are also using screed version 0.8
||
|| If you use this script in a publication, please cite EACH of the
following:
||
|| * MR Crusoe et al., 2014. http://dx.doi.org/10.6084/m9.figshare.979190
|| * CT Brown et al., arXiv:1203.4802 [q-bio.GN]
||
|| Please see http://khmer.readthedocs.org/en/latest/citations.html for
details.
PARAMETERS:
- kmer size = 20 (-k)
- n tables = 4 (-N)
- min tablesize = 1.6e+10 (-x)
Estimated memory usage is 6.4e+10 bytes (n_tables x min_tablesize)
--------
... kept 58012 of 200000 or 29%
... in file test.fastq.gz
... kept 116210 of 400000 or 29%
... in file test.fastq.gz
..... etc etc etc .....
... kept 90482098 of 346200000 or 26%
... in file test.fastq.gz
... kept 90529526 of 346400000 or 26%
... in file test.fastq.gz
Total number of unique k-mers: 850221
loading k-mer counting table from test.ct
DONE with test.fastq.gz; kept 90547512 of 346477608 or 26%
output in test_k20_C5.fastq.gz.keep
fp rate estimated to be 0.008
And here the second load-into-counting.py report, test2.ct.report
|| This is the script 'load-into-counting.py' in khmer.
|| You are running khmer version 1.3
|| You are also using screed version 0.8
||
|| If you use this script in a publication, please cite EACH of the
following:
||
|| * MR Crusoe et al., 2014. http://dx.doi.org/10.6084/m9.figshare.979190
|| * Q Zhang et al., http://dx.doi.org/10.1371/journal.pone.0101271
|| * A. D303266ring et al. http://dx.doi.org:80/10.1186/1471-2105-9-11
||
|| Please see http://khmer.readthedocs.org/en/latest/citations.html for
details.
PARAMETERS:
- kmer size = 20 (-k)
- n tables = 4 (-N)
- min tablesize = 1.6e+10 (-x)
Estimated memory usage is 6.4e+10 bytes (n_tables x min_tablesize)
--------
Saving k-mer counting table to test2.ct
Loading kmers from sequences in ['test_k20_C5.fastq.gz.keep']
making k-mer counting table
consuming input test_k20_C5.fastq.gz.keep
Total number of unique k-mers: 2822473008
saving test2.ct
Hope this helps!
Joran
On 11/05/15 12:12, C. Titus Brown wrote:
> On Mon, May 11, 2015 at 11:29:31AM +0200, Joran Martijn wrote:
>> Dear Khmer mailing list,
>>
>> I'm trying to compare the number of unique k-mers (lets say 20-mers) in
>> the raw dataset and diginormed dataset, similar as was done in the
>> original diginorm paper.
> [ elided ]
>
> Hi Joran,
>
> that certainly doesn't sound good :). Would it be possible to convey the
> various report files to us, publicly or privately?
>
> thanks,
> --titus
>
> p.s. Thank you for the very detailed question!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20150511/96dd402e/attachment.htm>
More information about the khmer
mailing list