[khmer] How to speed up the filter-below-abund script ?

Eric McDonald emcd.msu at gmail.com
Wed Mar 13 08:54:43 PDT 2013


Thanks for the information, Alexis.

The .below is an output file; it is empty because the script crashed before
anything was written to it.

I have some more ideas about what may be happening. But, rather than
speculating wildly, I want to reason in a slightly more data-driven manner.
So, could you please attach a copy of your job script, which has all of the
commands that you are running? (Among other things, I want to see the
parameters that you are using to construct the .kh file.) Also, can you
please show us the output of the following command:
  head -20 /mnt/var/home/ag/174r1_prinseq_good_bFr8.fasta.keep

Thanks,



On Wed, Mar 13, 2013 at 10:48 AM, Alexis Groppi <
alexis.groppi at u-bordeaux2.fr> wrote:

>  Another clue :
>
> A empty .below file is generated (174r1_prinseq_good_bFr8.fasta.keep.below)
>
> Alexis
>
>
> Le 13/03/2013 15:13, Alexis Groppi a écrit :
>
> Hi,
>
> Le 13/03/2013 14:12, Eric McDonald a écrit :
>
> Hi Alexis,
>
>  First, let me say thank you for being patient and working with us in
> spite of all the problems you are encountering.
>
>
> That's bioinformatician life ;)
>
>
>  With regards to the floating point exception, I see several
> opportunities for a division-by-zero condition in the threading utilities
> used by the script. These opportunities exist if an input file is empty.
> (The problem may be coming from another place, but this would be my first
> guess.) What does the following command say:
>
>    ls -lh /scratch/ag/khmer/174r1_table.kh /mnt/var/home/ag/174r1_
> prinseq_good_bFr8.fasta.keep
>
>
>  The result : (the files are not empty)
> -rw-r--r-- 1 ag users 299M 12 mars  20:54
> /mnt/var/home/ag/174r1_prinseq_good_bFr8.fasta.keep
> -rw-r--r-- 1 ag users 141G 12 mars  21:05 /scratch/ag/khmer/174r1_table.kh
>
>
> Also, since you appear to be using TORQUE as your resource manager/batch
> system, could you please attach the complete output and error files for the
> job? (These files should be of the form <job_name>.o2693 and
> <job_name>.e2693, where <job_name> is the name of your job. There may only
> be one or the other of these files, depending on site defaults and whether
> you specified "-j oe" or "-j eo" in your job submission.)
>
>
> I re run the job since I have deleted previous (2693) err/out files.
> Here is the new file (merged with the option -j oe in the bash script) :
>
> #############################
> User: ag
> Date: Wed Mar 13 14:59:21 CET 2013
> Host: rainman.cbib.u-bordeaux2.fr
> Directory: /mnt/var/home/ag
> PBS_JOBID: 2695.rainman
> PBS_O_WORKDIR: /mnt/var/home/ag
> PBS_NODEFILE:  rainman
> #############################
> #############################
> Debut filter-below-abund: Wed Mar 13 14:59:21 CET 2013
> starting threads
> starting writer
> loading...
> ... filtering 0
> /var/lib/torque/mom_priv/jobs/2695.rainman.SC: line 49: 54757 Floating
> point exception(core dumped) ./khmer-BETA/sandbox/fi
> lter-below-abund.py /scratch/ag/khmer/174r1_table.kh/mnt/var/home/ag/174r1_prinseq_good_bFr8.fasta.keep
>
> real    3m54.873s
> user    0m0.085s
> sys     2m2.180s
> Date fin: Wed Mar 13 15:03:15 CET 2013
> Job finished
>
> Thanks again for your help :)
>
> Alexis
>
>
>  Thanks,
>   Eric
>
>
>
> On Wed, Mar 13, 2013 at 5:38 AM, Alexis Groppi <
> alexis.groppi at u-bordeaux2.fr> wrote:
>
>>  Hi Eric,
>>
>> Thanks for your answer.
>> But unfortunately, after many attempts I'm getting this error :
>>
>> starting threads
>> starting writer
>> loading...
>> ... filtering 0
>> /var/lib/torque/mom_priv/jobs/2693.rainman.SC: line 46: 63657 Floating
>> point exception(core dumped) ./khmer-BETA/sandbox/filter-below-abund.py
>> /scratch/ag/khmer/174r1_table.kh/mnt/var/home/ag/174r1_prinseq_good_bFr8.fasta.keep
>>
>> real    3m30.163s
>> user    0m0.088s
>>
>> Your opinion ?
>>
>> Thanks
>>
>> Alexis
>>
>>
>> Le 13/03/2013 00:55, Eric McDonald a écrit :
>>
>> Hi Alexis,
>>
>>  One way to get the 'bleeding-edge' branch is to clone it into a fresh
>> directory; for example:
>>    git clone http://github.com/ged-lab/khmer.git -b bleeding-edge
>> khmer-BETA
>>
>>  Assuming you already have a clone of the 'ged-lab/khmer' repo, then you
>> should also be able to do:
>>   git fetch origin
>>   git checkout bleeding-edge
>> Depending on how old your Git client is and what its defaults are, you
>> may have to do the following instead:
>>   git checkout --track -b bleeding-edge origin/bleeding-edge
>>
>>  Hope this helps,
>>   Eric
>>
>>
>> On Tue, Mar 12, 2013 at 11:32 AM, Alexis Groppi <
>> alexis.groppi at u-bordeaux2.fr> wrote:
>>
>>>
>>> Le 12/03/2013 16:16, C. Titus Brown a écrit :
>>>
>>> On Tue, Mar 12, 2013 at 04:15:05PM +0100, Alexis Groppi wrote:
>>>
>>>  Hi Titus,
>>>
>>> Thanks for your answer
>>> Actually it's my second attempt with filter-below-abund.
>>> The first time, I thought the problem was coming from the location of my  table.kh file : in a storage element with poor level performance of I/O
>>> I killed the job after 24h, moved the file in a best place and re run it
>>> But with the same result : no completion after 24h
>>>
>>> Any Idea ?
>>>
>>> Thanks
>>>
>>> Cheers From Bordeaux :)
>>>
>>> Alexis
>>>
>>> PS : The command line was the following :
>>>
>>> ./filter-below-abund.py 174r1_table.kh 174r1_prinseq_good_bFr8.fasta.keep
>>>
>>> Is this correct ?
>>>
>>>  Yes, looks right... Can you try with the bleeding-edge branch, which now
>>> incorporates a potential fix for this issue?
>>>
>>>  From here : https://github.com/ged-lab/khmer/tree/bleeding-edge ?
>>> or
>>> here : https://github.com/ctb/khmer/tree/bleeding-edge ?
>>>
>>> Do I have to make a fresh install ? and How  ?
>>> Or just replace all the files and folders ?
>>>
>>> Thanks :)
>>>
>>> Alexis
>>>
>>>
>>>  thanks,
>>> --titus
>>>
>>>
>>>  Le 12/03/2013 14:41, C. Titus Brown a ?crit :
>>>
>>>  On Tue, Mar 12, 2013 at 10:48:03AM +0100, Alexis Groppi wrote:
>>>
>>>  Metagenome assembly :
>>> My data :
>>> - original (quality filtered) data : 4463243 reads (75 nt) (Illumina)
>>> 1/ Single pass digital normalization with normalize-by-median (C=20)
>>> ==> file .keep of 2560557 reads
>>> 2/ generated a hash table by load-into-counting on the .keep file
>>> ==> file .kh of ~16Go (huge file ?!)
>>> 3/ filter-below-abund with C=100 from the two previous file (table.kh
>>> and reads.keep)
>>> Still running after 24 hours  :(
>>>
>>> Any advice to speed up this step ? ... and the others (partitionning ...) ?
>>>
>>> I can have an access to a HPC : ~3000 cores.
>>>
>>>  Hi Alexis,
>>>
>>> filter-below-abund and filter-abund have occasional bugs that prevent them
>>> from completing.  I would kill and restart.  For that few reads it should
>>> take no more than a few hours to do everything.
>>>
>>> Most of what khmer does cannot easily be distributed across multiple chassis,
>>> note.
>>>
>>> best,
>>> --titus
>>>
>>>  --
>>>
>>>
>>>   --
>>>
>>> _______________________________________________
>>> khmer mailing list
>>> khmer at lists.idyll.org
>>> http://lists.idyll.org/listinfo/khmer
>>>
>>>
>>
>>
>>  --
>>  Eric McDonald
>> HPC/Cloud Software Engineer
>>   for the Institute for Cyber-Enabled Research (iCER)
>>   and the Laboratory for Genomics, Evolution, and Development (GED)
>> Michigan State University
>> P: 517-355-8733
>>
>>
>>   --
>>
>
>
>
>  --
>  Eric McDonald
> HPC/Cloud Software Engineer
>   for the Institute for Cyber-Enabled Research (iCER)
>   and the Laboratory for Genomics, Evolution, and Development (GED)
> Michigan State University
> P: 517-355-8733
>
>
> --
>
>
> --
>



-- 
Eric McDonald
HPC/Cloud Software Engineer
  for the Institute for Cyber-Enabled Research (iCER)
  and the Laboratory for Genomics, Evolution, and Development (GED)
Michigan State University
P: 517-355-8733
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20130313/e3c0b40a/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 29033 bytes
Desc: not available
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20130313/e3c0b40a/attachment-0008.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 29033 bytes
Desc: not available
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20130313/e3c0b40a/attachment-0009.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 29033 bytes
Desc: not available
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20130313/e3c0b40a/attachment-0010.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 29033 bytes
Desc: not available
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20130313/e3c0b40a/attachment-0011.png>


More information about the khmer mailing list