[khmer] Fwd: How to speed up the filter-below-abund script ?

Thu Mar 14 02:42:26 PDT 2013

Hi Eric,

I've tried all the suggestions you made
But same result (see attached e/o file)

But with the help of David (the system engineer of the lab) I think we 
have found the bug :
  ==> filter-below-abund.py fills a directory ( 
/var/spool/abrt/ccpp-2013-03-14-10\:24\:13-26642.new/ coredump/) until 
it reaches all the available space.  (see below)
==> Then it crashes

Is there a way to modify this ?

Thanks again

Alexis
**************************************************
[root at rainman ~]# ll -h 
/var/spool/abrt/ccpp-2013-03-14-10\:24\:13-26642.new/
total 12G
-rw-r----- 1 abrt users    4 14 mars  10:24 analyzer
-rw-r----- 1 abrt users    6 14 mars  10:24 architecture
-rw-r----- 1 abrt users  150 14 mars  10:24 cmdline
-rw-r----- 1 abrt users  12G 14 mars  10:24 coredump
-rw-r----- 1 abrt users 1,5K 14 mars  10:24 environ
-rw-r----- 1 abrt users   31 14 mars  10:24 executable
-rw-r----- 1 abrt users   27 14 mars  10:24 hostname
-rw-r----- 1 abrt users   26 14 mars  10:24 kernel
-rw-r----- 1 abrt users  13K 14 mars  10:24 maps
-rw-r----- 1 abrt users   26 14 mars  10:24 os_release
-rw-r----- 1 abrt users   71 14 mars  10:24 reason
-rw-r----- 1 abrt users   10 14 mars  10:24 time
-rw-r----- 1 abrt users    3 14 mars  10:24 uid

[root at rainman ~]# ll -h 
/var/spool/abrt/ccpp-2013-03-14-10\:24\:13-26642.new/
total 18G
-rw-r----- 1 abrt users    4 14 mars  10:24 analyzer
-rw-r----- 1 abrt users    6 14 mars  10:24 architecture
-rw-r----- 1 abrt users  150 14 mars  10:24 cmdline
-rw-r----- 1 abrt users  18G 14 mars  10:25 coredump
-rw-r----- 1 abrt users 1,5K 14 mars  10:24 environ
-rw-r----- 1 abrt users   31 14 mars  10:24 executable
-rw-r----- 1 abrt users   27 14 mars  10:24 hostname
-rw-r----- 1 abrt users   26 14 mars  10:24 kernel
-rw-r----- 1 abrt users  13K 14 mars  10:24 maps
-rw-r----- 1 abrt users   26 14 mars  10:24 os_release
-rw-r----- 1 abrt users   71 14 mars  10:24 reason
-rw-r----- 1 abrt users   10 14 mars  10:24 time
-rw-r----- 1 abrt users    3 14 mars  10:24 uid

Le 13/03/2013 22:58, Eric McDonald a écrit :
> Forwarding my earlier reply to the list, since I didn't reply-to-all 
> earlier.
>
> Also, Alexis, you may wish to change the following in your job script:
>   #PBS -l nodes=1:ppn=1
> to
>   #PBS -l nodes=1:ppn=8
> assuming that you have 8-core nodes available. 'filter-below-abund.py' 
> uses 8 threads by default; if a 'khmer' job runs on the same node as 
> another job, it may try using more CPU cores than it was allocated and 
> that could create problems with your systems administrators. And, if a 
> job's threads are restricted to the requested number of cores, then 
> you will also not be getting optimal performance by using more threads 
> (8) than available cores (1).
>
> ---------- Forwarded message ----------
> From: *Eric McDonald* <emcd.msu at gmail.com <mailto:emcd.msu at gmail.com>>
> Date: Wed, Mar 13, 2013 at 3:12 PM
> Subject: Re: [khmer] How to speed up the filter-below-abund script ?
> To: alexis.groppi at u-bordeaux2.fr <mailto:alexis.groppi at u-bordeaux2.fr>
>
>
> Alexis,
>
> I just realized that the floating-point exception is from inside the 
> Python interpreter itself. If the floating-point exception had 
> appeared from within the 'filter-below-abund.py' script, then we shoul 
> have seen a traceback from the exception, ending with:
>   ZeroDivisionError: float division by zero
> Instead, we are seeing:
> line 49: 54757 Floating point exception(core dumped)
> from your job shell. (I should've noticed that earlier.)
>
> Would you please add the following lines to your job script somewhere 
> before you invoke 'filter-below-abund.py':
>   python --version
>   which python
>
> And would you please add the following line _immediately after_ you 
> invoke 'filter-below-abund.py':
>   echo "Exit Code: $?"
>
> Also, would you remove the 'time' command from in front of your 
> invocation of 'filter-below-abund.py'?
>
> And, one more action before trying again... please run:
>   git pull
> in your 'khmer-BETA' directory. (I added another possible fix to the 
> 'bleeding-edge' branch. This command will pull that fix into your clone.)
>
> Thank you,
>   Eric
>
>
> On Wed, Mar 13, 2013 at 10:13 AM, Alexis Groppi 
> <alexis.groppi at u-bordeaux2.fr <mailto:alexis.groppi at u-bordeaux2.fr>> 
> wrote:
>
>     Hi,
>
>     Le 13/03/2013 14:12, Eric McDonald a écrit :
>>     Hi Alexis,
>>
>>     First, let me say thank you for being patient and working with us
>>     in spite of all the problems you are encountering.
>
>     That's bioinformatician life ;)
>
>
>>
>>     With regards to the floating point exception, I see several
>>     opportunities for a division-by-zero condition in the threading
>>     utilities used by the script. These opportunities exist if an
>>     input file is empty. (The problem may be coming from another
>>     place, but this would be my first guess.) What does the following
>>     command say:
>>
>>       ls -lh /scratch/ag/khmer/174r1_table.kh
>>     <http://174r1_table.kh/>
>>     /mnt/var/home/ag/174r1_prinseq_good_bFr8.fasta.keep
>
>      The result : (the files are not empty)
>     -rw-r--r-- 1 ag users 299M 12 mars 20:54
>     /mnt/var/home/ag/174r1_prinseq_good_bFr8.fasta.keep
>     -rw-r--r-- 1 ag users 141G 12 mars 21:05
>     /scratch/ag/khmer/174r1_table.kh <http://174r1_table.kh>
>
>
>>
>>     Also, since you appear to be using TORQUE as your resource
>>     manager/batch system, could you please attach the complete output
>>     and error files for the job? (These files should be of the form
>>     <job_name>.o2693 and <job_name>.e2693, where <job_name> is the
>>     name of your job. There may only be one or the other of these
>>     files, depending on site defaults and whether you specified "-j
>>     oe" or "-j eo" in your job submission.)
>
>     I re run the job since I have deleted previous (2693) err/out files.
>     Here is the new file (merged with the option -j oe in the bash
>     script) :
>
>     #############################
>     User: ag
>     Date: Wed Mar 13 14:59:21 CET 2013
>     Host: rainman.cbib.u-bordeaux2.fr <http://rainman.cbib.u-bordeaux2.fr>
>     Directory: /mnt/var/home/ag
>     PBS_JOBID: 2695.rainman
>     PBS_O_WORKDIR: /mnt/var/home/ag
>     PBS_NODEFILE:  rainman
>     #############################
>     #############################
>     Debut filter-below-abund: Wed Mar 13 14:59:21 CET 2013
>
>     starting threads
>     starting writer
>     loading...
>     ... filtering 0
>     /var/lib/torque/mom_priv/jobs/2695.rainman.SC
>     <http://2695.rainman.SC>: line 49: 54757 Floating point
>     exception(core dumped) ./khmer-BETA/sandbox/fi
>     lter-below-abund.py /scratch/ag/khmer/174r1_table.kh
>     <http://174r1_table.kh>
>     /mnt/var/home/ag/174r1_prinseq_good_bFr8.fasta.keep
>
>     real    3m54.873s
>     user    0m0.085s
>     sys     2m2.180s
>     Date fin: Wed Mar 13 15:03:15 CET 2013
>     Job finished
>
>     Thanks again for your help :)
>
>     Alexis
>
>
>>
>>     Thanks,
>>       Eric
>>
>>
>>
>>     On Wed, Mar 13, 2013 at 5:38 AM, Alexis Groppi
>>     <alexis.groppi at u-bordeaux2.fr
>>     <mailto:alexis.groppi at u-bordeaux2.fr>> wrote:
>>
>>         Hi Eric,
>>
>>         Thanks for your answer.
>>         But unfortunately, after many attempts I'm getting this error :
>>
>>         starting threads
>>         starting writer
>>         loading...
>>         ... filtering 0
>>         /var/lib/torque/mom_priv/jobs/2693.rainman.SC
>>         <http://2693.rainman.SC>: line 46: 63657 Floating point
>>         exception(core dumped)
>>         ./khmer-BETA/sandbox/filter-below-abund.py
>>         /scratch/ag/khmer/174r1_table.kh <http://174r1_table.kh>
>>         /mnt/var/home/ag/174r1_prinseq_good_bFr8.fasta.keep
>>
>>         real    3m30.163s
>>         user    0m0.088s
>>
>>         Your opinion ?
>>
>>         Thanks
>>
>>         Alexis
>>
>>
>>         Le 13/03/2013 00:55, Eric McDonald a écrit :
>>>         Hi Alexis,
>>>
>>>         One way to get the 'bleeding-edge' branch is to clone it
>>>         into a fresh directory; for example:
>>>            git clone http://github.com/ged-lab/khmer.git -b
>>>         bleeding-edge khmer-BETA
>>>
>>>         Assuming you already have a clone of the 'ged-lab/khmer'
>>>         repo, then you should also be able to do:
>>>           git fetch origin
>>>           git checkout bleeding-edge
>>>         Depending on how old your Git client is and what its
>>>         defaults are, you may have to do the following instead:
>>>           git checkout --track -b bleeding-edge origin/bleeding-edge
>>>
>>>         Hope this helps,
>>>           Eric
>>>
>>>
>>>         On Tue, Mar 12, 2013 at 11:32 AM, Alexis Groppi
>>>         <alexis.groppi at u-bordeaux2.fr
>>>         <mailto:alexis.groppi at u-bordeaux2.fr>> wrote:
>>>
>>>
>>>             Le 12/03/2013 16:16, C. Titus Brown a écrit :
>>>>             On Tue, Mar 12, 2013 at 04:15:05PM +0100, Alexis Groppi wrote:
>>>>>             Hi Titus,
>>>>>
>>>>>             Thanks for your answer
>>>>>             Actually it's my second attempt with filter-below-abund.
>>>>>             The first time, I thought the problem was coming from the location of my
>>>>>             table.kh  <http://table.kh>  file : in a storage element with poor level performance of I/O
>>>>>             I killed the job after 24h, moved the file in a best place and re run it
>>>>>             But with the same result : no completion after 24h
>>>>>
>>>>>             Any Idea ?
>>>>>
>>>>>             Thanks
>>>>>
>>>>>             Cheers From Bordeaux :)
>>>>>
>>>>>             Alexis
>>>>>
>>>>>             PS : The command line was the following :
>>>>>
>>>>>             ./filter-below-abund.py174r1_table.kh  <http://174r1_table.kh>  174r1_prinseq_good_bFr8.fasta.keep
>>>>>
>>>>>             Is this correct ?
>>>>             Yes, looks right... Can you try with the bleeding-edge branch, which now
>>>>             incorporates a potential fix for this issue?
>>>             From here :
>>>             https://github.com/ged-lab/khmer/tree/bleeding-edge ?
>>>             or
>>>             here : https://github.com/ctb/khmer/tree/bleeding-edge ?
>>>
>>>             Do I have to make a fresh install ? and How  ?
>>>             Or just replace all the files and folders ?
>>>
>>>             Thanks :)
>>>
>>>             Alexis
>>>
>>>
>>>>             thanks,
>>>>             --titus
>>>>
>>>>>             Le 12/03/2013 14:41, C. Titus Brown a ?crit :
>>>>>>             On Tue, Mar 12, 2013 at 10:48:03AM +0100, Alexis Groppi wrote:
>>>>>>>             Metagenome assembly :
>>>>>>>             My data :
>>>>>>>             - original (quality filtered) data : 4463243 reads (75 nt) (Illumina)
>>>>>>>             1/ Single pass digital normalization with normalize-by-median (C=20)
>>>>>>>             ==> file .keep of 2560557 reads
>>>>>>>             2/ generated a hash table by load-into-counting on the .keep file
>>>>>>>             ==> file .kh of ~16Go (huge file ?!)
>>>>>>>             3/ filter-below-abund with C=100 from the two previous file (table.kh  <http://table.kh>
>>>>>>>             and reads.keep)
>>>>>>>             Still running after 24 hours  :(
>>>>>>>
>>>>>>>             Any advice to speed up this step ? ... and the others (partitionning ...) ?
>>>>>>>
>>>>>>>             I can have an access to a HPC : ~3000 cores.
>>>>>>             Hi Alexis,
>>>>>>
>>>>>>             filter-below-abund and filter-abund have occasional bugs that prevent them
>>>>>>             from completing.  I would kill and restart.  For that few reads it should
>>>>>>             take no more than a few hours to do everything.
>>>>>>
>>>>>>             Most of what khmer does cannot easily be distributed across multiple chassis,
>>>>>>             note.
>>>>>>
>>>>>>             best,
>>>>>>             --titus
>>>>>             -- 
>>>
>>>             -- 
>>>
>>>             _______________________________________________
>>>             khmer mailing list
>>>             khmer at lists.idyll.org <mailto:khmer at lists.idyll.org>
>>>             http://lists.idyll.org/listinfo/khmer
>>>
>>>
>>>
>>>
>>>         -- 
>>>         Eric McDonald
>>>         HPC/Cloud Software Engineer
>>>           for the Institute for Cyber-Enabled Research (iCER)
>>>           and the Laboratory for Genomics, Evolution, and
>>>         Development (GED)
>>>         Michigan State University
>>>         P: 517-355-8733 <tel:517-355-8733>
>>
>>         -- 
>>
>>
>>
>>
>>     -- 
>>     Eric McDonald
>>     HPC/Cloud Software Engineer
>>       for the Institute for Cyber-Enabled Research (iCER)
>>       and the Laboratory for Genomics, Evolution, and Development (GED)
>>     Michigan State University
>>     P: 517-355-8733 <tel:517-355-8733>
>
>     -- 
>
>
>
>
> -- 
> Eric McDonald
> HPC/Cloud Software Engineer
>   for the Institute for Cyber-Enabled Research (iCER)
>   and the Laboratory for Genomics, Evolution, and Development (GED)
> Michigan State University
> P: 517-355-8733 <tel:517-355-8733>
>
>
>
> -- 
> Eric McDonald
> HPC/Cloud Software Engineer
>   for the Institute for Cyber-Enabled Research (iCER)
>   and the Laboratory for Genomics, Evolution, and Development (GED)
> Michigan State University
> P: 517-355-8733
>
>
> _______________________________________________
> khmer mailing list
> khmer at lists.idyll.org
> http://lists.idyll.org/listinfo/khmer

-- 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20130314/22ca632f/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 29033 bytes
Desc: not available
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20130314/22ca632f/attachment-0008.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 29033 bytes
Desc: not available
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20130314/22ca632f/attachment-0009.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 29033 bytes
Desc: not available
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20130314/22ca632f/attachment-0010.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Signature_Mail_A_Groppi.png
Type: image/png
Size: 29033 bytes
Desc: not available
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20130314/22ca632f/attachment-0011.png>
-------------- next part --------------
#############################
User: ag
Date: Thu Mar 14 10:23:34 CET 2013
Host: rainman.cbib.u-bordeaux2.fr
Directory: /scratch/ag/khmer
PBS_JOBID: 2703.rainman
PBS_O_WORKDIR: /scratch/ag/khmer
PBS_NODEFILE:  rainman
#############################
#############################
Python infos:
Python 2.6.6
/mnt/var/home/ag/env/bin/python
#############################
Debut filter-below-abund: Thu Mar 14 10:23:34 CET 2013
starting threads
starting writer
loading...
... filtering 0
/var/lib/torque/mom_priv/jobs/2703.rainman.SC: line 55: 26642 Floating point exception(core dumped) /mnt/var/home/ag/khmer-BETA/sandbox/filter-below-abund.py /scratch/ag/khmer/174r1_table.kh /scratch/ag/khmer/174r1_prinseq_good_bFr8.fasta.keep
Exit Code: 136
Date fin: Thu Mar 14 10:26:26 CET 2013
Job finished