[khmer] normalize-by-median.py Hanging

Daniel Standage daniel.standage at gmail.com
Wed Sep 17 13:38:29 PDT 2014


Here it is: https://gist.github.com/standage/4222de94bd695f23f673

Forgot to mention, running khmer 1.1

Thanks,
Daniel


--
Daniel S. Standage
Ph.D. Candidate
Computational Genome Science Laboratory
Indiana University

On Wed, Sep 17, 2014 at 4:31 PM, C. Titus Brown <ctb at msu.edu> wrote:

> On Wed, Sep 17, 2014 at 04:27:09PM -0400, Daniel Standage wrote:
> > Before running norm-by-median, I
> >
> >    - Downloaded SRA file
> >    - Used fastq-dump to create paired Fastq files
> >    - used interleave-reads to create a Fastq file in the One True Format
> >
> > All of the Fastq files seem to be fine i.e. none appear truncated. Memory
> > usage is remaining constant, CPU utilization is 100%, but the weird thing
> > is that as far as I can tell the norm-by-median script is complete. It
> has
> > processed all the input, given a final report, and all of the kept reads
> > have been written to output: except the last read is missing and the
> second
> > to last read is cut off.
>
> Sounds like a classic fencepost error :(.
>
> Could you send me (on or off the list) the SRA commands you used and
> the command line you ran with diginorm?  Then we'll see if we can replicate
> on our own hardware.
>
> cheers,
> --titus
>
> > On Wed, Sep 17, 2014 at 4:21 PM, C. Titus Brown <ctb at msu.edu> wrote:
> >
> > > Hi Daniel,
> > >
> > > sounds like an infinite loop of some sort :(.
> > >
> > > A few questions --
> > >
> > > What version of khmer are you using?
> > >
> > > Have you run the reads file through any other software?  I'm worried
> > > that the file is truncated in some way.
> > >
> > > Do you know how far through your reads file it's gotten?
> > >
> > > Is memory usage increasing or remaining constant?
> > >
> > > thanks,
> > > --titus
> > >
> > > On Wed, Sep 17, 2014 at 04:16:37PM -0400, Daniel Standage wrote:
> > > > Hi all,
> > > >
> > > > I am seeing some strange behavior running normalize-by-median.py. The
> > > > program seemed to complete successfully after 30-45 minutes, but
> then it
> > > > just hung there. It's now been at least 90 minutes and it's
> continuing to
> > > > hang. The output file seems to contain all the data except the last
> > > record,
> > > > and the second-to-last record is cut off.
> > > >
> > > > (khmer-env)[standage at bggnomic qc] tail SRR494178_int.fastq.keep
> > > > +
> > > >
> > >
> GBGED>>E##################################################################################
> > > > @SRR494178.12090255/1
> > > >
> > >
> TCGAGGACNACCTTTTGACCCTTCTGCAACCTTTGAATTTCAGACATCAAACTCTCCCTCTGTCGTGTCTCCNNCAATGATGGGTCGGGC
> > > > +
> > > >
> > >
> IIIIIGGG#GGGGGGIIIIIIIIIIIIIIIIGIHIIIIIGIIIIIIIIIIIIIHIIIIHIEGHHIFIHII=?##?;9>>;IGBFFGBD8G
> > > > @SRR494178.12090255/2
> > > >
> > >
> GATTCCGTCACCGAGGAGTATCCGTTGCCGAGGTTGTGCGTCTGTCGAACCTGGCCGTTCTTTTTGACCGTGTAGGTGCCGCCGTTGATC
> > > > +
> > > > IIIIIIHIIIIIIIIIBIHHIIIGIIIIIII(khmer-env)[standage at bggnomic qc]
> > > >
> > > > Any ideas as to what could be causing this?
> > > >
> > > > Thanks,
> > > > Daniel
> > > >
> > > > PS.
> > > >
> > > >    - OS: Fedora 20 with lots o RAM (100s of GB)
> > > >    - Command: normalize-by-median.py -k 17 -p -N 4 -x 8e9
> > > >    SRR494178_int.fastq
> > > >    - Data: http://www.ncbi.nlm.nih.gov/sra/?term=SRR494178
> > > >
> > > >
> > > > --
> > > > Daniel S. Standage
> > > > Ph.D. Candidate
> > > > Computational Genome Science Laboratory
> > > > Indiana University
> > >
> > > > _______________________________________________
> > > > khmer mailing list
> > >
> > > --
> > > C. Titus Brown, ctb at msu.edu
> > >
>
> --
> C. Titus Brown, ctb at msu.edu
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20140917/7511b737/attachment-0001.htm>


More information about the khmer mailing list