[khmer] Digital normalization

C. Titus Brown ctb at msu.edu
Mon Aug 26 10:41:18 PDT 2013


Hmm, OK.  What does

gunzip -c *.se.qc.fq.gz | tail

show?

thanks,
--titus

On Mon, Aug 26, 2013 at 11:39:10AM -0600, Joann Diray Arce wrote:
> It looks like a regular fastq file to me though....
> 
> @HISEQ2000B:459:D1KYNACXX:4:1101:1594:2125 1:N:0:CGGAAA
> ATGCATAACTTTCAGAATGGCGCCACCCAATCTCATGGTTAACAGACGACTCAACCAAAATTAAAGAAAG
> +
> CCCFFFFFHHHHHJJJIHIIIJJJJJJJIGIIJIEIGGHIEDGGIIIJIGHGIIIIJJHHHHHHFFFFFE
> @HISEQ2000B:459:D1KYNACXX:4:1101:1901:2147 1:N:0:CGGAAT
> ATATGCTCCTAGGAGCATAATATACTTATCGAGTCCCAACTGAGCTGGTTAAGGTTTGGTGTATGTCAAT
> +
> CCCFFFFFHGFHHGHGIJJGGHIIIIIGIHIFIGGIJJJJJGIIJJJIFHIGHIGHJGI8C;=FFFCGDH
> @HISEQ2000B:459:D1KYNACXX:4:1101:3782:2117 1:N:0:CGGAAT
> CAAATGGAAAAAAAAAACGAAAAAGAAACAAAATTCGAATTCAACTCAACTCAACAGCAAAAAAAAATGA
> 
> 
> On Mon, Aug 26, 2013 at 11:27 AM, C. Titus Brown <ctb at msu.edu> wrote:
> 
> > Hi Joann,
> >
> > could you send us the output of
> >
> > gunzip -c *.se.qc.fq.gz | head
> >
> > pls? thanks! I wonder what's in that file if it's not FQ?
> >
> > --titus
> >
> > On Mon, Aug 26, 2013 at 10:15:05AM -0600, Joann Diray Arce wrote:
> > > I actually ran this last week. Do I need to reinstall khmer? OR should I
> > > start from my original fastq and see how it goes.
> > >
> > >
> > > Estimated memory usage is 1.2e+10 bytes (n_hashes x min_hashsize)
> > > --------
> > > Traceback (most recent call last):
> > >   File
> > >
> > "/fslhome/jdiray/compute/SuaedaIllumina2013/khmer/scripts/normalize-by-median.py",
> > > line 156, in <module>
> > >     main()
> > >   File
> > >
> > "/fslhome/jdiray/compute/SuaedaIllumina2013/khmer/scripts/normalize-by-median.py",
> > > line 85, in main
> > >     for n, batch in enumerate(batchwise(screed.open(input_filename),
> > > batch_size)):
> > >   File
> > >
> > "/fslhome/jdiray/compute/SuaedaIllumina2013/lib/python2.7/site-packages/screed/fastq.py",
> > > line 21, in fastq_iter
> > >     raise IOError("Bad FASTQ format: no '@' at beginning of line")
> > > IOError: Bad FASTQ format: no '@' at beginning of line
> > >
> > > My output files end usually at 9%
> > >
> > > Here is my code:
> > > python /..../khmer/scripts/normalize-by-median.py -k 20 -C 20 -N 4 -x 3e9
> > > --loadhash normC20k20.kh --savehash normC20k20.kh *.se.qc.fq.gz
> > >
> > > --
> > > *Joann Diray Arce*
> > > Graduate Student
> > > Department of Microbiology and Molecular Biology
> > > Brigham Young University
> > > (801)7352371
> >
> > > _______________________________________________
> > > khmer mailing list
> > > khmer at lists.idyll.org
> > > http://lists.idyll.org/listinfo/khmer
> >
> >
> > --
> > C. Titus Brown, ctb at msu.edu
> >
> 
> 
> 
> -- 
> *Joann Diray Arce*
> Graduate Student
> Department of Microbiology and Molecular Biology
> Brigham Young University
> (801)7352371

-- 
C. Titus Brown, ctb at msu.edu




More information about the khmer mailing list