<div dir="ltr">Thank you Julia for doing a git checkout. Just to verify that operation can you share the output of `git describe` inside your checkout?<div><br></div><div>I'm also tracking this issue on GitHub: <a href="https://github.com/ged-lab/khmer/issues/266">https://github.com/ged-lab/khmer/issues/266</a></div>
</div><div class="gmail_extra"><br><br><div class="gmail_quote">On Tue, Dec 24, 2013 at 5:59 PM, Oh, Julia (NIH/NHGRI) [F] <span dir="ltr"><<a href="mailto:julia.oh@nih.gov" target="_blank">julia.oh@nih.gov</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Results are in and the error reproduced:<br>
<br>
The following commands yield:<br>
<div class="im">python2.7 /home/ohjs/khmer/scripts/normalize-by-median.py -C 20 -k 20 -N 4 -x 60e9 --savehash <a href="http://round2.unaligned_ref.kh" target="_blank">round2.unaligned_ref.kh</a> -R round2.unaligned_1.report round2.unaligned;<br>
</div>python2.7 /home/ohjs/khmer/scripts/filter-abund.py <a href="http://round2.unaligned_ref.kh" target="_blank">round2.unaligned_ref.kh</a> round2.unaligned.keep;<br>
<div class="im">python2.7 /home/ohjs/khmer/scripts/normalize-by-median.py -C 5 -k 20 -N 4 -x 16e9 round2.unaligned.keep.abundfilt;<br>
<br>
</div>This last command yields:<br>
<br>
########<br>
... kept 116741181 of 151000000 or 77%<br>
... in file round2.unaligned.keep.abundfilt<br>
... kept 116816167 of 151100000 or 77%<br>
... in file round2.unaligned.keep.abundf-------- running PBS epilogue script (5081978.biobos p78 ohjs) --------<br>
<br>
Show some job stats:<br>
<br>
5081978.biobos elapsed time: 9485 seconds<br>
5081978.biobos walltime: 02:37:52 hh:mm:ss<br>
5081978.biobos memory limit: 69.14 GB<br>
5081978.biobos memory used: 69.16 GB<br>
5081978.biobos cpupercent used: 98.00 %<br>
<br>
==================================================================================================<br>
|| NOTE: this job was likely deleted by the batch system due to exceeding available memory. ||<br>
==================================================================================================<br>
<br>
#########<br>
<br>
<br>
Thanks & happy holidays,<br>
Julia<br>
<div class="HOEnZb"><div class="h5"><br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
On Dec 18, 2013, at 10:46 AM, C. Titus Brown <<a href="mailto:ctb@msu.edu">ctb@msu.edu</a>> wrote:<br>
<br>
> On Wed, Dec 18, 2013 at 03:43:22PM +0000, Oh, Julia (NIH/NHGRI) [F] wrote:<br>
>> [ohjs@helix khmer]$ git checkout master<br>
>> Branch master set up to track remote branch master from origin.<br>
>> Switched to a new branch 'master'<br>
>> [ohjs@helix khmer]$ make<br>
>><br>
>> ===> lots of stuff, ending with:<br>
>><br>
>> copying build/lib.linux-x86_64-2.6/khmer/_khmermodule.so -> khmer<br>
>> make[1]: Leaving directory `/home/ohjs/khmer/python'<br>
>><br>
>> [ohjs@helix khmer]$ git branch<br>
>> bleeding-edge<br>
>> * master<br>
><br>
> OK, great! This is the latest development version; can you see if you can<br>
> reproduce the problem with it? (Sadly, I expect you will, as we haven't<br>
> made many significant changes to normalize-by-median's machinery...)<br>
><br>
> best,<br>
> --titus<br>
><br>
>> On Dec 18, 2013, at 8:10 AM, C. Titus Brown <<a href="mailto:ctb@msu.edu">ctb@msu.edu</a>> wrote:<br>
>><br>
>>> On Wed, Dec 18, 2013 at 03:07:57AM +0000, Oh, Julia (NIH/NHGRI) [F] wrote:<br>
>>>> Titus?thanks for the tip on variable coverage; will definitely try that out.<br>
>>><br>
>>> Great -- should significantly improve sensitivity to low coverage "stuff"!<br>
>>><br>
>>>> Michael?pretty sure I did a git clone. The last date in my directory is Sept 5th?but not sure if that would be pull date or your last modified date.<br>
>>><br>
>>> OK, and then one last check... did you check out the 'master' or 'legacy'<br>
>>> branch? What does 'git branch' report?<br>
>>><br>
>>> To check out master, do:<br>
>>><br>
>>> git checkout master<br>
>>> make<br>
>>><br>
>>> cheers,<br>
>>> --titus<br>
>>><br>
>>>> On Dec 17, 2013, at 8:16 PM, Michael R. Crusoe <<a href="mailto:mcrusoe@msu.edu">mcrusoe@msu.edu</a><mailto:<a href="mailto:mcrusoe@msu.edu">mcrusoe@msu.edu</a>>> wrote:<br>
>>>><br>
>>>> Hello Julia,<br>
>>>><br>
>>>> What version of khmer are you using?<br>
>>>><br>
>>>> That is, did you install via `pip` or a `git clone`?<br>
>>>><br>
>>>><br>
>>>> On Tue, Dec 17, 2013 at 5:14 PM, C. Titus Brown <<a href="mailto:ctb@msu.edu">ctb@msu.edu</a><mailto:<a href="mailto:ctb@msu.edu">ctb@msu.edu</a>>> wrote:<br>
>>>> On Tue, Dec 17, 2013 at 04:36:34PM -0800, C. Titus Brown wrote:<br>
>>>>> On Tue, Dec 17, 2013 at 07:53:18PM +0000, Oh, Julia (NIH/NHGRI) [F] wrote:<br>
>>>>> Now, on to your real question :)<br>
>>>>><br>
>>>>>> $python2.7 /home/ohjs/khmer/scripts/normalize-by-median.py -C 5 -k 20 -N 4 -x 16e9 round2.unaligned.keep.abundfilt;<br>
>>>>>><br>
>>>>>> I thought I would be maxing out at 64 GB ram for the hash table (I?ve also used 32e9), but I get the following RAM usage report of<br>
>>>>>><br>
>>>>>> 4986693.biobos elapsed time: 23358 seconds<br>
>>>>>> 4986693.biobos walltime: 06:28:36 hh:mm:ss<br>
>>>>>> 4986693.biobos memory limit: 249.00 GB<br>
>>>>>> 4986693.biobos memory used: 249.76 GB<br>
>>>>>> 4986693.biobos cpupercent used: 98.00 %<br>
>>>>><br>
>>>>> What the heck!? That's not supposed to happen!<br>
>>>>><br>
>>>>> This is either a bug, or (most likely) is being caused by an overabundance of<br>
>>>>> high-abundance k-mers. The latter is easy to fix -- I've filed a bug report to<br>
>>>>> fix the latter in the software overall [0] -- but would require you to modify<br>
>>>>> the script at the moment. If you're up for that, put<br>
>>>>><br>
>>>>> ht.set_use_bigcount(False)<br>
>>>>><br>
>>>>> at line 186 of normalize-by-median:<br>
>>>><br>
>>>> Darn it, that can't be the problem; I just wrote a test against this<br>
>>>> behavior and we actually did things right in the script and ignored<br>
>>>> high abundance k-mers.<br>
>>>><br>
>>>> So, this must be a bug of some sort. Umm... Michael, any ideas?!<br>
>>>><br>
>>>> cheers,<br>
>>>> --titus<br>
>>>> --<br>
>>>> C. Titus Brown, <a href="mailto:ctb@msu.edu">ctb@msu.edu</a><mailto:<a href="mailto:ctb@msu.edu">ctb@msu.edu</a>><br>
>>>><br>
>>>> _______________________________________________<br>
>>>> khmer mailing list<br>
>>>> <a href="mailto:khmer@lists.idyll.org">khmer@lists.idyll.org</a><mailto:<a href="mailto:khmer@lists.idyll.org">khmer@lists.idyll.org</a>><br>
>>>> <a href="http://lists.idyll.org/listinfo/khmer" target="_blank">http://lists.idyll.org/listinfo/khmer</a><br>
>>>><br>
>>>><br>
>>>><br>
>>>> --<br>
>>>> Michael R. Crusoe: Software Engineer and Bioinformatician <a href="mailto:mcrusoe@msu.edu">mcrusoe@msu.edu</a><mailto:<a href="mailto:mcrusoe@msu.edu">mcrusoe@msu.edu</a>><br>
>>>> @ the Genomics, Evolution, and Development lab; Michigan State University<br>
>>>> <a href="http://ged.msu.edu/" target="_blank">http://ged.msu.edu/</a> <a href="http://orcid.org/0000-0002-2961-9670" target="_blank">http://orcid.org/0000-0002-2961-9670</a> @biocrusoe<<a href="http://twitter.com/biocrusoe" target="_blank">http://twitter.com/biocrusoe</a>><br>
>>>><br>
>>><br>
>>> --<br>
>>> C. Titus Brown, <a href="mailto:ctb@msu.edu">ctb@msu.edu</a><br>
>><br>
><br>
> --<br>
> C. Titus Brown, <a href="mailto:ctb@msu.edu">ctb@msu.edu</a><br>
<br>
</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br><div dir="ltr"><font face="courier new, monospace">Michael R. Crusoe: Software Engineer and Bioinformatician <a href="mailto:mcrusoe@msu.edu" target="_blank">mcrusoe@msu.edu</a><br>
@ the Genomics, Evolution, and Development lab; Michigan State University<br><a href="http://ged.msu.edu/" target="_blank">http://ged.msu.edu/</a> <a href="http://orcid.org/0000-0002-2961-9670" target="_blank">http://orcid.org/0000-0002-2961-9670</a> <a href="http://twitter.com/biocrusoe" target="_blank">@biocrusoe</a></font><br>
</div>
</div>