[khmer] Suggestion on check_space test

C. Titus Brown ctb at msu.edu
Thu Sep 25 09:39:27 PDT 2014


Sorry for top posting; traveling.

For some scripts, input size is poor predictor of output size, which will usually be much smaller.  For other scripts, output size is not predictable. I think this could usefully be discussed in an issue before PR begins?

Also, force flag is higher priority than all of this; with a force flag, it matters less if we get the details wrong for space checking.

--titus

---
C. Titus Brown, ctb at msu.edu

> On Sep 25, 2014, at 11:32, Ramakrishnan Srinivasan <ramrs at nyu.edu> wrote:
> 
> Scripts with an option to write to a different directory are:
> 
> a. count-median.py
> b. count-overlap.py
> c. do-partition.py
> d. extract-long-sequences.py
> e. extract-paired-reads.py
> f. extract-partitions.py
> g. fastq-to-fasta.py
> h. filter-abund.py
> i. interleave-reads.py
> j. load-into-counting.py
> k. normalize-by-median.py
> l. sample-reads-randomly.py
> 
> The solution is to add an optional parameter that holds output location. We check for free space in out dir equivalent to the largest input file size.
> 
> If it's OK with you, I can take this up.
> 
> Also, the script extract-paired-reads.py uses a mix of argparse and sys.argv[]. Should we maybe modify this to use a single approach?
> 
> --
> Ram
> 
>> On Thu, Sep 25, 2014 at 12:08 PM, Ramakrishnan Srinivasan <ramrs at nyu.edu> wrote:
>> I think this is a case of passing the output file path in scripts where -in path can be different from -out path (such as in normalize-by-median).
>> 
>> --
>> Ram
>> 
>>> On Thu, Sep 25, 2014 at 12:06 PM, C. Titus Brown <ctb at msu.edu> wrote:
>>> On Thu, Sep 25, 2014 at 10:58:46AM -0500, Adina Chuang Howe wrote:
>>> > Hi team,
>>> >
>>> > I was running diginorm today on two different disks (one for the
>>> > intermediate normalized files) and one on another disk (containing the raw
>>> > sequences) .  I think this will be fairly typical in the future.
>>> > Currently, a test (file.py check_space) will error out, given this setup.
>>> > I think it'd be nice to be able to bypass this check or at least make it
>>> > more robust (to checking the write-out disk only).
>>> >
>>> > Low priority, but notable, hopefully helpful.
>>> 
>>> Agreed -- see
>>> 
>>> https://github.com/ged-lab/khmer/issues/399
>>> 
>>> --titus
>>> --
>>> C. Titus Brown, ctb at msu.edu
>>> 
>>> _______________________________________________
>>> khmer mailing list
>>> khmer at lists.idyll.org
>>> http://lists.idyll.org/listinfo/khmer
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20140925/8187fe28/attachment-0001.htm>


More information about the khmer mailing list