[khmer] normalize-by-median.py Hanging

C. Titus Brown ctb at msu.edu
Wed Sep 17 13:31:30 PDT 2014


On Wed, Sep 17, 2014 at 04:27:09PM -0400, Daniel Standage wrote:
> Before running norm-by-median, I
> 
>    - Downloaded SRA file
>    - Used fastq-dump to create paired Fastq files
>    - used interleave-reads to create a Fastq file in the One True Format
> 
> All of the Fastq files seem to be fine i.e. none appear truncated. Memory
> usage is remaining constant, CPU utilization is 100%, but the weird thing
> is that as far as I can tell the norm-by-median script is complete. It has
> processed all the input, given a final report, and all of the kept reads
> have been written to output: except the last read is missing and the second
> to last read is cut off.

Sounds like a classic fencepost error :(.

Could you send me (on or off the list) the SRA commands you used and
the command line you ran with diginorm?  Then we'll see if we can replicate
on our own hardware.

cheers,
--titus

> On Wed, Sep 17, 2014 at 4:21 PM, C. Titus Brown <ctb at msu.edu> wrote:
> 
> > Hi Daniel,
> >
> > sounds like an infinite loop of some sort :(.
> >
> > A few questions --
> >
> > What version of khmer are you using?
> >
> > Have you run the reads file through any other software?  I'm worried
> > that the file is truncated in some way.
> >
> > Do you know how far through your reads file it's gotten?
> >
> > Is memory usage increasing or remaining constant?
> >
> > thanks,
> > --titus
> >
> > On Wed, Sep 17, 2014 at 04:16:37PM -0400, Daniel Standage wrote:
> > > Hi all,
> > >
> > > I am seeing some strange behavior running normalize-by-median.py. The
> > > program seemed to complete successfully after 30-45 minutes, but then it
> > > just hung there. It's now been at least 90 minutes and it's continuing to
> > > hang. The output file seems to contain all the data except the last
> > record,
> > > and the second-to-last record is cut off.
> > >
> > > (khmer-env)[standage at bggnomic qc] tail SRR494178_int.fastq.keep
> > > +
> > >
> > GBGED>>E##################################################################################
> > > @SRR494178.12090255/1
> > >
> > TCGAGGACNACCTTTTGACCCTTCTGCAACCTTTGAATTTCAGACATCAAACTCTCCCTCTGTCGTGTCTCCNNCAATGATGGGTCGGGC
> > > +
> > >
> > IIIIIGGG#GGGGGGIIIIIIIIIIIIIIIIGIHIIIIIGIIIIIIIIIIIIIHIIIIHIEGHHIFIHII=?##?;9>>;IGBFFGBD8G
> > > @SRR494178.12090255/2
> > >
> > GATTCCGTCACCGAGGAGTATCCGTTGCCGAGGTTGTGCGTCTGTCGAACCTGGCCGTTCTTTTTGACCGTGTAGGTGCCGCCGTTGATC
> > > +
> > > IIIIIIHIIIIIIIIIBIHHIIIGIIIIIII(khmer-env)[standage at bggnomic qc]
> > >
> > > Any ideas as to what could be causing this?
> > >
> > > Thanks,
> > > Daniel
> > >
> > > PS.
> > >
> > >    - OS: Fedora 20 with lots o RAM (100s of GB)
> > >    - Command: normalize-by-median.py -k 17 -p -N 4 -x 8e9
> > >    SRR494178_int.fastq
> > >    - Data: http://www.ncbi.nlm.nih.gov/sra/?term=SRR494178
> > >
> > >
> > > --
> > > Daniel S. Standage
> > > Ph.D. Candidate
> > > Computational Genome Science Laboratory
> > > Indiana University
> >
> > > _______________________________________________
> > > khmer mailing list
> >
> > --
> > C. Titus Brown, ctb at msu.edu
> >

-- 
C. Titus Brown, ctb at msu.edu



More information about the khmer mailing list