[khmer] How to speed up the filter-below-abund script ?

Alexis Groppi alexis.groppi at u-bordeaux2.fr
Wed Mar 13 07:13:32 PDT 2013


Hi,

Le 13/03/2013 14:12, Eric McDonald a écrit :
> Hi Alexis,
>
> First, let me say thank you for being patient and working with us in 
> spite of all the problems you are encountering.

That's bioinformatician life ;)

>
> With regards to the floating point exception, I see several 
> opportunities for a division-by-zero condition in the threading 
> utilities used by the script. These opportunities exist if an input 
> file is empty. (The problem may be coming from another place, but this 
> would be my first guess.) What does the following command say:
>
>   ls -lh /scratch/ag/khmer/174r1_table.kh <http://174r1_table.kh/> 
> /mnt/var/home/ag/174r1_prinseq_good_bFr8.fasta.keep

  The result : (the files are not empty)
-rw-r--r-- 1 ag users 299M 12 mars  20:54 
/mnt/var/home/ag/174r1_prinseq_good_bFr8.fasta.keep
-rw-r--r-- 1 ag users 141G 12 mars  21:05 /scratch/ag/khmer/174r1_table.kh

>
> Also, since you appear to be using TORQUE as your resource 
> manager/batch system, could you please attach the complete output and 
> error files for the job? (These files should be of the form 
> <job_name>.o2693 and <job_name>.e2693, where <job_name> is the name of 
> your job. There may only be one or the other of these files, depending 
> on site defaults and whether you specified "-j oe" or "-j eo" in your 
> job submission.)

I re run the job since I have deleted previous (2693) err/out files.
Here is the new file (merged with the option -j oe in the bash script) :

#############################
User: ag
Date: Wed Mar 13 14:59:21 CET 2013
Host: rainman.cbib.u-bordeaux2.fr
Directory: /mnt/var/home/ag
PBS_JOBID: 2695.rainman
PBS_O_WORKDIR: /mnt/var/home/ag
PBS_NODEFILE:  rainman
#############################
#############################
Debut filter-below-abund: Wed Mar 13 14:59:21 CET 2013
starting threads
starting writer
loading...
... filtering 0
/var/lib/torque/mom_priv/jobs/2695.rainman.SC: line 49: 54757 Floating 
point exception(core dumped) ./khmer-BETA/sandbox/fi
lter-below-abund.py /scratch/ag/khmer/174r1_table.kh 
/mnt/var/home/ag/174r1_prinseq_good_bFr8.fasta.keep

real    3m54.873s
user    0m0.085s
sys     2m2.180s
Date fin: Wed Mar 13 15:03:15 CET 2013
Job finished

Thanks again for your help :)

Alexis

>
> Thanks,
>   Eric
>
>
>
> On Wed, Mar 13, 2013 at 5:38 AM, Alexis Groppi 
> <alexis.groppi at u-bordeaux2.fr <mailto:alexis.groppi at u-bordeaux2.fr>> 
> wrote:
>
>     Hi Eric,
>
>     Thanks for your answer.
>     But unfortunately, after many attempts I'm getting this error :
>
>     starting threads
>     starting writer
>     loading...
>     ... filtering 0
>     /var/lib/torque/mom_priv/jobs/2693.rainman.SC
>     <http://2693.rainman.SC>: line 46: 63657 Floating point
>     exception(core dumped) ./khmer-BETA/sandbox/filter-below-abund.py
>     /scratch/ag/khmer/174r1_table.kh <http://174r1_table.kh>
>     /mnt/var/home/ag/174r1_prinseq_good_bFr8.fasta.keep
>
>     real    3m30.163s
>     user    0m0.088s
>
>     Your opinion ?
>
>     Thanks
>
>     Alexis
>
>
>     Le 13/03/2013 00:55, Eric McDonald a écrit :
>>     Hi Alexis,
>>
>>     One way to get the 'bleeding-edge' branch is to clone it into a
>>     fresh directory; for example:
>>        git clone http://github.com/ged-lab/khmer.git -b bleeding-edge
>>     khmer-BETA
>>
>>     Assuming you already have a clone of the 'ged-lab/khmer' repo,
>>     then you should also be able to do:
>>       git fetch origin
>>       git checkout bleeding-edge
>>     Depending on how old your Git client is and what its defaults
>>     are, you may have to do the following instead:
>>       git checkout --track -b bleeding-edge origin/bleeding-edge
>>
>>     Hope this helps,
>>       Eric
>>
>>
>>     On Tue, Mar 12, 2013 at 11:32 AM, Alexis Groppi
>>     <alexis.groppi at u-bordeaux2.fr
>>     <mailto:alexis.groppi at u-bordeaux2.fr>> wrote:
>>
>>
>>         Le 12/03/2013 16:16, C. Titus Brown a écrit :
>>>         On Tue, Mar 12, 2013 at 04:15:05PM +0100, Alexis Groppi wrote:
>>>>         Hi Titus,
>>>>
>>>>         Thanks for your answer
>>>>         Actually it's my second attempt with filter-below-abund.
>>>>         The first time, I thought the problem was coming from the location of my
>>>>         table.kh  <http://table.kh>  file : in a storage element with poor level performance of I/O
>>>>         I killed the job after 24h, moved the file in a best place and re run it
>>>>         But with the same result : no completion after 24h
>>>>
>>>>         Any Idea ?
>>>>
>>>>         Thanks
>>>>
>>>>         Cheers From Bordeaux :)
>>>>
>>>>         Alexis
>>>>
>>>>         PS : The command line was the following :
>>>>
>>>>         ./filter-below-abund.py174r1_table.kh  <http://174r1_table.kh>  174r1_prinseq_good_bFr8.fasta.keep
>>>>
>>>>         Is this correct ?
>>>         Yes, looks right... Can you try with the bleeding-edge branch, which now
>>>         incorporates a potential fix for this issue?
>>         From here : https://github.com/ged-lab/khmer/tree/bleeding-edge ?
>>         or
>>         here : https://github.com/ctb/khmer/tree/bleeding-edge ?
>>
>>         Do I have to make a fresh install ? and How ?
>>         Or just replace all the files and folders ?
>>
>>         Thanks :)
>>
>>         Alexis
>>
>>
>>>         thanks,
>>>         --titus
>>>
>>>>         Le 12/03/2013 14:41, C. Titus Brown a ?crit :
>>>>>         On Tue, Mar 12, 2013 at 10:48:03AM +0100, Alexis Groppi wrote:
>>>>>>         Metagenome assembly :
>>>>>>         My data :
>>>>>>         - original (quality filtered) data : 4463243 reads (75 nt) (Illumina)
>>>>>>         1/ Single pass digital normalization with normalize-by-median (C=20)
>>>>>>         ==> file .keep of 2560557 reads
>>>>>>         2/ generated a hash table by load-into-counting on the .keep file
>>>>>>         ==> file .kh of ~16Go (huge file ?!)
>>>>>>         3/ filter-below-abund with C=100 from the two previous file (table.kh  <http://table.kh>
>>>>>>         and reads.keep)
>>>>>>         Still running after 24 hours  :(
>>>>>>
>>>>>>         Any advice to speed up this step ? ... and the others (partitionning ...) ?
>>>>>>
>>>>>>         I can have an access to a HPC : ~3000 cores.
>>>>>         Hi Alexis,
>>>>>
>>>>>         filter-below-abund and filter-abund have occasional bugs that prevent them
>>>>>         from completing.  I would kill and restart.  For that few reads it should
>>>>>         take no more than a few hours to do everything.
>>>>>
>>>>>         Most of what khmer does cannot easily be distributed across multiple chassis,
>>>>>         note.
>>>>>
>>>>>         best,
>>>>>         --titus
>>>>         -- 
>>
>>         -- 
>>
>>         _______________________________________________
>>         khmer mailing list
>>         khmer at lists.idyll.org <mailto:khmer at lists.idyll.org>
>>         http://lists.idyll.org/listinfo/khmer
>>
>>
>>
>>
>>     -- 
>>     Eric McDonald
>>     HPC/Cloud Software Engineer
>>       for the Institute for Cyber-Enabled Research (iCER)
>>       and the Laboratory for Genomics, Evolution, and Development (GED)
>>     Michigan State University
>>     P: 517-355-8733 <tel:517-355-8733>
>
>     -- 
>
>
>
>
> -- 
> Eric McDonald
> HPC/Cloud Software Engineer
>   for the Institute for Cyber-Enabled Research (iCER)
>   and the Laboratory for Genomics, Evolution, and Development (GED)
> Michigan State University
> P: 517-355-8733

-- 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20130313/d2f0d02e/attachment-0002.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 29033 bytes
Desc: not available
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20130313/d2f0d02e/attachment-0006.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 29033 bytes
Desc: not available
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20130313/d2f0d02e/attachment-0007.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Signature_Mail_A_Groppi.png
Type: image/png
Size: 29033 bytes
Desc: not available
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20130313/d2f0d02e/attachment-0008.png>


More information about the khmer mailing list