[khmer] Questions about abundance-dist-single.py

Sun Dec 22 21:18:33 PST 2013

I also have some explanations below, maybe supplementary to Titus' answer.

On Sun, Dec 22, 2013 at 11:28 PM, Tamer Mansour
<drtamermansour at gmail.com> wrote:
> Hi,
> I have couple questions about the "abundance-dist-single.py"
> 1) The input file: should I concatenate the single reads files to the
> interleaved paired-ended files?

abundance-dist-single.py  is a single-step/in-memory version of
abundance-dist.py; no counting hash file will be created unless
–savehash is specified. It is good for counting k-mer in single(one)
sequence file.
This has nothing to do with the concept of  single-ended/paired-ended files.
 If you want to count k-mers in multiple files, use
load-into-counting.py and abundance-dist.py.

> 2) Does the '-b' option increase or decrease the chance of false positive
> results? is not recommended to be used?

If  ‘-b’ option is turned on, this script is constant memory, k-mer
counts will stop at 255.
This should not influence the false positive rate of counting. Only
the counting frequency of high abundance k-mers will not be at most
255 even though the real frequency is higher. If you can tolerate the
counting inaccuracy of those high abundance k-mers. You can turn it on
for it will save memory usage.

> 3) Does the '-savehash' option requires increasing the job resources (either
> time or ram)?

It will only save the hash on the hard disk for further use. So it
will not increase time or memory usage.
But it will consume some hard disk storage.

There some more explanations of the scripts here:
http://khmer.readthedocs.org/en/latest/scripts.html

>
> Thank you
>
> Tamer
>
> _______________________________________________
> khmer mailing list
> khmer at lists.idyll.org
> http://lists.idyll.org/listinfo/khmer
>