[khmer] Extracting the original reads after diginorm + partitioning

C. Titus Brown ctb at msu.edu
Sun Mar 10 21:33:50 PDT 2013


On Thu, Mar 07, 2013 at 02:10:17PM -0500, Adina Chuang Howe wrote:
> Possible bug in sweep-reads...I'm not recovering the partitioned reads
> from the original dataset.
> 
> First observed this when I looked at lotsa partitions and trying to
> recover swept reads - some swept files would show up empty:
> 
> command:
> python sweep-reads3.py -N 4 -k 32 -x 1e9
> /mnt/research/gpgc/hmp-mock-partitions/001264-files/no-sweep-pids/pid*fa
> /mnt/research/gpgc/hmp-mock-partitions/SRR-combined.fastq
> 
> troubleshooting:
> Then I looked at just one partition:
> on HPC:  /mnt/scratch/howead/test
> python sweep-reads3.py pid-42391.fa SRR-combined.fastq
> 
> And resulting sweepfile is empty.
> 
> If I run:
> python sweep-reads3.py pid-42391.fa pid-42391.fa
> 
> Behavior is correct.

Very weird.  I don't see anything obviously wrong in the script, which
just means it's a subtle and deep bug.

Two questions --

 - is it reproducible, i.e. do you get the same results every time you
   run it? (please yes)

 - can you break it down to a smaller failure point than with SRR-combined,
   e.g. maybe a few hundred k reads?

thanks,
--titus
-- 
C. Titus Brown, ctb at msu.edu




More information about the khmer mailing list