[khmer] Digital normalization

C. Titus Brown ctb at msu.edu
Mon Aug 26 10:46:20 PDT 2013


Yep.  If you regenerate .se. file using the latest version of khmer
from bleeding-edge, you shouldn't have this problem any more;
just make sure you update any copy of khmer you have.  Simplest
would be to delete and re-clone.

best,
-titus

On Mon, Aug 26, 2013 at 11:43:50AM -0600, Joann Diray Arce wrote:
> Yup, I guess thats the problem...the mixture of regular fastq and this
> without the quality score.
> 
> >HISEQ2000B:459:D1KYNACXX:4:2308:19383:200665/2
> AAGCAGTGGTATCAACGCAGAGTACATGGGAAGCTGCCTTCTGCTGGATTCCAAGTTGAGGTTGTCCTTGTCGACCATGATGGTTCACTAACACCACAAC
> >HISEQ2000B:459:D1KYNACXX:4:2308:20366:200633/2
> ATTCTCCTCAAGGTACTTGGCAACTTCAGTAGTTGATCTCACTTTCTTCCCAGTAGGAGTCACATAATGAGCATCCAATTTAGAGAAGTCCCTCCTAAGA
> >HISEQ2000B:459:D1KYNACXX:4:2308:20631:200524/2
> TCTGAGCACAGGATTATCCCTATTTTTAGGGAGGTTTGTGTTCTTCAACTTTCAGAGGGAGAATGTGGCAAAGCAAGGATTGCCGGAGCAAAATGGGATG
> >HISEQ2000B:459:D1KYNACXX:4:2308:20685:200640/2
> ACAAGTAGCACATTGTGATATGATAGCCTCAACATCAACAACCATCTTAGGCCAATAAAAATGTTCTTGTAACATAGTCAATGTCTTCGTAGTGCTATAC
> >HISEQ2000B:459:D1KYNACXX:4:2308:20852:200707/2
> ATTGACACGAGTAAACTTGTCAACGTTAAAGTCGGAACCACATATAACCCCTTGGTTGTACACCTTTTTTATGGATTTAACAATGTTCATAGGATGTTTG
> 
> 
> On Mon, Aug 26, 2013 at 11:41 AM, C. Titus Brown <ctb at msu.edu> wrote:
> 
> > Hmm, OK.  What does
> >
> > gunzip -c *.se.qc.fq.gz | tail
> >
> > show?
> >
> > thanks,
> > --titus
> >
> > On Mon, Aug 26, 2013 at 11:39:10AM -0600, Joann Diray Arce wrote:
> > > It looks like a regular fastq file to me though....
> > >
> > > @HISEQ2000B:459:D1KYNACXX:4:1101:1594:2125 1:N:0:CGGAAA
> > > ATGCATAACTTTCAGAATGGCGCCACCCAATCTCATGGTTAACAGACGACTCAACCAAAATTAAAGAAAG
> > > +
> > > CCCFFFFFHHHHHJJJIHIIIJJJJJJJIGIIJIEIGGHIEDGGIIIJIGHGIIIIJJHHHHHHFFFFFE
> > > @HISEQ2000B:459:D1KYNACXX:4:1101:1901:2147 1:N:0:CGGAAT
> > > ATATGCTCCTAGGAGCATAATATACTTATCGAGTCCCAACTGAGCTGGTTAAGGTTTGGTGTATGTCAAT
> > > +
> > > CCCFFFFFHGFHHGHGIJJGGHIIIIIGIHIFIGGIJJJJJGIIJJJIFHIGHIGHJGI8C;=FFFCGDH
> > > @HISEQ2000B:459:D1KYNACXX:4:1101:3782:2117 1:N:0:CGGAAT
> > > CAAATGGAAAAAAAAAACGAAAAAGAAACAAAATTCGAATTCAACTCAACTCAACAGCAAAAAAAAATGA
> > >
> > >
> > > On Mon, Aug 26, 2013 at 11:27 AM, C. Titus Brown <ctb at msu.edu> wrote:
> > >
> > > > Hi Joann,
> > > >
> > > > could you send us the output of
> > > >
> > > > gunzip -c *.se.qc.fq.gz | head
> > > >
> > > > pls? thanks! I wonder what's in that file if it's not FQ?
> > > >
> > > > --titus
> > > >
> > > > On Mon, Aug 26, 2013 at 10:15:05AM -0600, Joann Diray Arce wrote:
> > > > > I actually ran this last week. Do I need to reinstall khmer? OR
> > should I
> > > > > start from my original fastq and see how it goes.
> > > > >
> > > > >
> > > > > Estimated memory usage is 1.2e+10 bytes (n_hashes x min_hashsize)
> > > > > --------
> > > > > Traceback (most recent call last):
> > > > >   File
> > > > >
> > > >
> > "/fslhome/jdiray/compute/SuaedaIllumina2013/khmer/scripts/normalize-by-median.py",
> > > > > line 156, in <module>
> > > > >     main()
> > > > >   File
> > > > >
> > > >
> > "/fslhome/jdiray/compute/SuaedaIllumina2013/khmer/scripts/normalize-by-median.py",
> > > > > line 85, in main
> > > > >     for n, batch in enumerate(batchwise(screed.open(input_filename),
> > > > > batch_size)):
> > > > >   File
> > > > >
> > > >
> > "/fslhome/jdiray/compute/SuaedaIllumina2013/lib/python2.7/site-packages/screed/fastq.py",
> > > > > line 21, in fastq_iter
> > > > >     raise IOError("Bad FASTQ format: no '@' at beginning of line")
> > > > > IOError: Bad FASTQ format: no '@' at beginning of line
> > > > >
> > > > > My output files end usually at 9%
> > > > >
> > > > > Here is my code:
> > > > > python /..../khmer/scripts/normalize-by-median.py -k 20 -C 20 -N 4
> > -x 3e9
> > > > > --loadhash normC20k20.kh --savehash normC20k20.kh *.se.qc.fq.gz
> > > > >
> > > > > --
> > > > > *Joann Diray Arce*
> > > > > Graduate Student
> > > > > Department of Microbiology and Molecular Biology
> > > > > Brigham Young University
> > > > > (801)7352371
> > > >
> > > > > _______________________________________________
> > > > > khmer mailing list
> > > > > khmer at lists.idyll.org
> > > > > http://lists.idyll.org/listinfo/khmer
> > > >
> > > >
> > > > --
> > > > C. Titus Brown, ctb at msu.edu
> > > >
> > >
> > >
> > >
> > > --
> > > *Joann Diray Arce*
> > > Graduate Student
> > > Department of Microbiology and Molecular Biology
> > > Brigham Young University
> > > (801)7352371
> >
> > --
> > C. Titus Brown, ctb at msu.edu
> >
> 
> 
> 
> -- 
> *Joann Diray Arce*
> Graduate Student
> Department of Microbiology and Molecular Biology
> Brigham Young University
> (801)7352371

-- 
C. Titus Brown, ctb at msu.edu




More information about the khmer mailing list