[khmer] Suggestion on check_space test

Ramakrishnan Srinivasan ramrs at nyu.edu
Thu Sep 25 09:59:37 PDT 2014


Agreed. Does this look OK? https://github.com/ged-lab/khmer/issues/618

--
Ram

On Thu, Sep 25, 2014 at 12:39 PM, C. Titus Brown <ctb at msu.edu> wrote:

> Sorry for top posting; traveling.
>
> For some scripts, input size is poor predictor of output size, which will
> usually be much smaller.  For other scripts, output size is not
> predictable. I think this could usefully be discussed in an issue before PR
> begins?
>
> Also, force flag is higher priority than all of this; with a force flag,
> it matters less if we get the details wrong for space checking.
>
> --titus
>
> ---
> C. Titus Brown, ctb at msu.edu
>
> On Sep 25, 2014, at 11:32, Ramakrishnan Srinivasan <ramrs at nyu.edu> wrote:
>
> Scripts with an option to write to a different directory are:
>
> a. count-median.py
> b. count-overlap.py
> c. do-partition.py
> d. extract-long-sequences.py
> e. extract-paired-reads.py
> f. extract-partitions.py
> g. fastq-to-fasta.py
> h. filter-abund.py
> i. interleave-reads.py
> j. load-into-counting.py
> k. normalize-by-median.py
> l. sample-reads-randomly.py
>
> The solution is to add an optional parameter that holds output location.
> We check for free space in out dir equivalent to the largest input file
> size.
>
> If it's OK with you, I can take this up.
>
> Also, the script extract-paired-reads.py uses a mix of argparse and
> sys.argv[]. Should we maybe modify this to use a single approach?
>
> --
> Ram
>
> On Thu, Sep 25, 2014 at 12:08 PM, Ramakrishnan Srinivasan <ramrs at nyu.edu>
> wrote:
>
>> I think this is a case of passing the output file path in scripts where
>> -in path can be different from -out path (such as in normalize-by-median).
>>
>> --
>> Ram
>>
>> On Thu, Sep 25, 2014 at 12:06 PM, C. Titus Brown <ctb at msu.edu> wrote:
>>
>>> On Thu, Sep 25, 2014 at 10:58:46AM -0500, Adina Chuang Howe wrote:
>>> > Hi team,
>>> >
>>> > I was running diginorm today on two different disks (one for the
>>> > intermediate normalized files) and one on another disk (containing the
>>> raw
>>> > sequences) .  I think this will be fairly typical in the future.
>>> > Currently, a test (file.py check_space) will error out, given this
>>> setup.
>>> > I think it'd be nice to be able to bypass this check or at least make
>>> it
>>> > more robust (to checking the write-out disk only).
>>> >
>>> > Low priority, but notable, hopefully helpful.
>>>
>>> Agreed -- see
>>>
>>> https://github.com/ged-lab/khmer/issues/399
>>>
>>> --titus
>>> --
>>> C. Titus Brown, ctb at msu.edu
>>>
>>> _______________________________________________
>>> khmer mailing list
>>> khmer at lists.idyll.org
>>> http://lists.idyll.org/listinfo/khmer
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20140925/a2d5ece8/attachment.htm>


More information about the khmer mailing list