[khmer] Extracting the original reads after diginorm + partitioning
Adina Chuang Howe
adina.chuang at gmail.com
Tue Mar 12 21:03:01 PDT 2013
It is definitely reproducible. And I have at least one sequence that can
be identified that is causing this.
Here's what I'm seeing: (these are on the same HPC scratch space as before)
test1.fa contains only (37 bp):
>SRR172902.316476 42391
AAGACACCCTCACCCCTAGCTGCGCGAGGCCCTCTCC
SRR172902.316476.single.fq contains the untrimmed same:
@SRR172902.316476 USI-EAS376:1:5:311:233 length=75
AAGACACCCTCACCCCTAGCTGCGCGAGGCCCTCTCCCCTGGGTAGAGGGTCAAACAGCGCAAGGCAACAGATCG
+SRR172902.316476 USI-EAS376:1:5:311:233 length=75
BBBBBABBB at BBBBBB>B>B=AA>B?A=4ABA;B>>AAA79;;068799;2;====>3>9>7922=;739#####
python sweep-reads3.py test1.fa SRR172902.316476.single.fq
results in an empty test1.fa.sweep
But...
python sweep-reads3.py SRR172902.316476.single.fq SRR172902.316476.single.fq
Results in
==> SRR172902.316476.single.fq.sweep3 <==
>SRR172902.316476
AAGACACCCTCACCCCTAGCTGCGCGAGGCCCTCTCCCCTGGGTAGAGGGTCAAACAGCGCAAGGCAACAGATCG
Any clue?
On Sun, Mar 10, 2013 at 11:33 PM, C. Titus Brown <ctb at msu.edu> wrote:
> On Thu, Mar 07, 2013 at 02:10:17PM -0500, Adina Chuang Howe wrote:
> > Possible bug in sweep-reads...I'm not recovering the partitioned reads
> > from the original dataset.
> >
> > First observed this when I looked at lotsa partitions and trying to
> > recover swept reads - some swept files would show up empty:
> >
> > command:
> > python sweep-reads3.py -N 4 -k 32 -x 1e9
> > /mnt/research/gpgc/hmp-mock-partitions/001264-files/no-sweep-pids/pid*fa
> > /mnt/research/gpgc/hmp-mock-partitions/SRR-combined.fastq
> >
> > troubleshooting:
> > Then I looked at just one partition:
> > on HPC: /mnt/scratch/howead/test
> > python sweep-reads3.py pid-42391.fa SRR-combined.fastq
> >
> > And resulting sweepfile is empty.
> >
> > If I run:
> > python sweep-reads3.py pid-42391.fa pid-42391.fa
> >
> > Behavior is correct.
>
> Very weird. I don't see anything obviously wrong in the script, which
> just means it's a subtle and deep bug.
>
> Two questions --
>
> - is it reproducible, i.e. do you get the same results every time you
> run it? (please yes)
>
> - can you break it down to a smaller failure point than with SRR-combined,
> e.g. maybe a few hundred k reads?
>
> thanks,
> --titus
> --
> C. Titus Brown, ctb at msu.edu
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20130312/c942d90a/attachment-0001.htm>
More information about the khmer
mailing list