[khmer] Extracting the original reads after diginorm + partitioning
C. Titus Brown
ctb at msu.edu
Sun Mar 10 21:33:50 PDT 2013
On Thu, Mar 07, 2013 at 02:10:17PM -0500, Adina Chuang Howe wrote:
> Possible bug in sweep-reads...I'm not recovering the partitioned reads
> from the original dataset.
>
> First observed this when I looked at lotsa partitions and trying to
> recover swept reads - some swept files would show up empty:
>
> command:
> python sweep-reads3.py -N 4 -k 32 -x 1e9
> /mnt/research/gpgc/hmp-mock-partitions/001264-files/no-sweep-pids/pid*fa
> /mnt/research/gpgc/hmp-mock-partitions/SRR-combined.fastq
>
> troubleshooting:
> Then I looked at just one partition:
> on HPC: /mnt/scratch/howead/test
> python sweep-reads3.py pid-42391.fa SRR-combined.fastq
>
> And resulting sweepfile is empty.
>
> If I run:
> python sweep-reads3.py pid-42391.fa pid-42391.fa
>
> Behavior is correct.
Very weird. I don't see anything obviously wrong in the script, which
just means it's a subtle and deep bug.
Two questions --
- is it reproducible, i.e. do you get the same results every time you
run it? (please yes)
- can you break it down to a smaller failure point than with SRR-combined,
e.g. maybe a few hundred k reads?
thanks,
--titus
--
C. Titus Brown, ctb at msu.edu
More information about the khmer
mailing list