[khmer] gzip files

Eric McDonald emcd.msu at gmail.com
Thu Mar 14 07:47:31 PDT 2013


Hi Huan,

Excellent! Glad to hear it.

Please let us know if you have any additional problems.

Eric


On Thu, Mar 14, 2013 at 10:26 AM, Huan Fan <hfan22 at wisc.edu> wrote:

> Hi Eric,
>
> Thanks! It works now!
>
> Cheers,
> Huan
>
> On 03/13/13, Eric McDonald  wrote:
> > Huan,
> >
> > There are several ways to change the order. Probably the easiest to help
> you with remotely is the following:
> > cd ~/khmer-BETA
> > virtualenv --no-site-packages PYTHON-ENV
> > . PYTHON-ENV/bin/activate
> > export PYTHONPATH="$HOME/screed:$HOME/khmer-BETA/python"
> >
> >
> > You could also install screed and khmer into the virtual environment
> above, but that is extra work so let&#39;s skip that for now.
> >
> >
> > Hope this helps,
> > Eric
> >
> >
> > P.S. If you don&#39;t have &#39;virtualenv&#39; available, then let me
> know and we can make the necessary changes another way.
> >
> > On Wed, Mar 13, 2013 at 6:41 PM, Huan Fan <hfan22 at wisc.edu <
> hfan22 at wisc.edu')" target="1">hfan22 at wisc.edu> wrote:
> >
> > > Hi Eric,
> > >
> > > I think you&#39;re right. But how do I change the order?
> > >
> > > heather at chc2-desktop:~$
> PYTHONPATH="$HOME/screed:$HOME/khmer-BETA/python" python -c "import sys,
> pprint; pprint.pprint( sys.path )"
> > > [&#39;&#39;,
> > >
> &#39;/usr/local/lib/python2.7/dist-packages/khmer-0.4-py2.7-linux-x86_64.egg&#39;,
> > > &#39;/usr/local/lib/python2.7/dist-packages/screed-0.7-py2.7.egg&#39;,
> > >
> &#39;/usr/local/lib/python2.7/dist-packages/ReferenceFreeTools-1.0.2b-py2.7.egg&#39;,
> > >
> &#39;/usr/local/lib/python2.7/dist-packages/biopython-1.60-py2.7-linux-x86_64.egg&#39;,
> > > &#39;/usr/lib/pymodules/python2.7&#39;,
> > >
> &#39;/usr/local/lib/python2.7/dist-packages/HTSeq-0.5.3p9-py2.7-linux-x86_64.egg&#39;,
> > > &#39;/usr/local/lib/python2.7/dist-packages/nose-1.2.1-py2.7.egg&#39;,
> > > &#39;/home/heather/screed&#39;,
> > > &#39;/home/heather/khmer-BETA/python&#39;,
> > > &#39;/usr/lib/python2.7&#39;,
> > > &#39;/usr/lib/python2.7/plat-linux2&#39;,
> > > &#39;/usr/lib/python2.7/lib-tk&#39;,
> > > &#39;/usr/lib/python2.7/lib-old&#39;,
> > > &#39;/usr/lib/python2.7/lib-dynload&#39;,
> > > &#39;/usr/local/lib/python2.7/dist-packages&#39;,
> > > &#39;/usr/lib/python2.7/dist-packages&#39;,
> > > &#39;/usr/lib/python2.7/dist-packages/PIL&#39;,
> > > &#39;/usr/lib/pymodules/python2.7/gtk-2.0&#39;,
> > > &#39;/usr/lib/python2.7/dist-packages/gst-0.10&#39;,
> > > &#39;/usr/lib/python2.7/dist-packages/gtk-2.0&#39;,
> > > &#39;/usr/lib/pymodules/python2.7/ubuntuone-storage-protocol&#39;,
> > > &#39;/usr/lib/pymodules/python2.7/ubuntuone-control-panel&#39;,
> > > &#39;/usr/lib/pymodules/python2.7/ubuntuone-client&#39;,
> > > &#39;/usr/lib/pymodules/python2.7/libubuntuone&#39;,
> > > &#39;/usr/lib/python2.7/dist-packages/wx-2.8-gtk2-unicode&#39;]
> > >
> > > heather at chc2-desktop:~$
> PYTHONPATH="$HOME/screed:$HOME/khmer-BETA/python" python -S -c "import sys,
> pprint; pprint.pprint( sys.path )"
> > > [&#39;&#39;,
> > > &#39;/home/heather/screed&#39;,
> > > &#39;/home/heather/khmer-BETA/python&#39;,
> > > &#39;/usr/lib/python2.7/&#39;,
> > > &#39;/usr/lib/python2.7/plat-linux2&#39;,
> > > &#39;/usr/lib/python2.7/lib-tk&#39;,
> > > &#39;/usr/lib/python2.7/lib-old&#39;,
> > > &#39;/usr/lib/python2.7/lib-dynload&#39;]
> > >
> > > On 03/13/13, Eric McDonald wrote:
> > > > Hi Huan,
> > > >
> > >
> > > > Interesting... the path for the old khmer must be getting placed in
> front of the that for the new one with the Python interpreter&#39;s path
> list.
> > > >
> > > >
> > > > Please let us know the results of:
> > > > PYTHONPATH="$HOME/screed:$HOME/khmer-BETA/python" python -c "import
> sys, pprint; pprint.pprint( sys.path )"
> > > > PYTHONPATH="$HOME/screed:$HOME/khmer-BETA/python" python -S -c
> "import sys, pprint; pprint.pprint( sys.path )"
> > > >
> > > >
> > > > Thanks,
> > > > Eric
> > > >
> > > >
> > > >
> > >
> > > > On Wed, Mar 13, 2013 at 5:27 PM, Huan Fan <setup.py.in(javascript:main.compose('new',
> 't=hfan22 at wisc.edu> <hfan22 at wisc.edu <hfan22 at wisc.edu>&#39;)" target="1">
> hfan22 at wisc.edu <hfan22 at wisc.edu>> wrote:
> > > >
> > > > > Hi Eric,
> > > > >
> > > > > I reset the PYTHONPATH but nothing changed:
> > > > >
> > > > > heather at chc2-desktop:~/khmer-BETA/scripts$ export
> PYTHONPATH="$HOME/screed:$HOME/khmer-BETA/python"
> > > > >
> > > > > heather at chc2-desktop:~/khmer-BETA/scripts$ echo $PYTHONPATH
> > > > > /home/heather/screed:/home/heather/khmer-BETA/python
> > > > >
> > > > > heather at chc2-desktop:~/khmer-BETA/scripts$ python -c "import
> khmer; print khmer.__file__"
> > > > >
> /usr/local/lib/python2.7/dist-packages/khmer-0.4-py2.7-linux-x86_64.egg/khmer/__init__.pyc
> > > > >
> > > > >
> > > > > heather at chc2-desktop:~/khmer-BETA/scripts$ test -r
> ~/khmer-BETA/python/khmer/threading_args.py; echo $?
> > > > > 0
> > > > >
> > > > > heather at chc2-desktop:~/khmer-BETA/scripts$ ./load-into-counting.py
> > > > > Traceback (most recent call last):
> > > > > File "./load-into-counting.py", line 17, in <module>
> > > > > from khmer.threading_args import add_threading_args
> > > > > ImportError: No module named threading_args
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On 03/13/13, Eric McDonald wrote:
> > > > > > Hi Huan,
> > > > > >
> > > > >
> > > > > > Thank you for the information. The problem is as I suspected -
> you are not using the correct &#39;khmer&#39;. Details inline below:
> > > > > >
> > > > > >
> > >
> > > > > > On Wed, Mar 13, 2013 at 1:08 PM, Huan Fan <> > > > > > running
> build_ext
> > > > > > >
> > > > > > > > > building &#39;khmer._khmermodule&#39; extension
> > > > > > > > > creating build
> > > > > > > > > creating build/temp.linux-x86_64-2.7
> > > > > > > > > gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2
> -Wall -Wstrict-prototypes -fPIC -I../lib -I/usr/include/python2.7 -c
> _khmermodule.cc -o build/temp.linux-x86_64-2.7/_khmermodule.o
> > > > > > > > > cc1plus: warning: command line option
> "-Wstrict-prototypes" is valid for Ada/C/ObjC but not for C++
> > > > > > > > > In file included from /usr/include/python2.7/Python.h:8:0,
> > > > > > > > > from _khmermodule.cc:7:
> > > > > > > > > /usr/include/python2.7/pyconfig.h:1155:0: warning:
> "_POSIX_C_SOURCE" redefined
> > > > > > > > > /usr/include/features.h:163:0: note: this is the location
> of the previous definition
> > > > > > > > > /usr/include/python2.7/pyconfig.h:1177:0: warning:
> "_XOPEN_SOURCE" redefined
> > > > > > > > > /usr/include/features.h:165:0: note: this is the location
> of the previous definition
> > > > > > > > > g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions
> -Wl,-Bsymbolic-functions build/temp.linux-x86_64-2.7/_khmermodule.o
> ../lib/khmer_config.o ../lib/thread_id_map.o ../lib/trace_logger.o
> ../lib/perf_metrics.o ../lib/read_parsers.o ../lib/ktable.o
> ../lib/hashtable.o ../lib/hashbits.o ../lib/counting.o ../lib/subset.o
> ../lib/zlib/adler32.o ../lib/zlib/compress.o ../lib/zlib/crc32.o
> ../lib/zlib/deflate.o ../lib/zlib/gzio.o ../lib/zlib/infback.o
> ../lib/zlib/inffast.o ../lib/zlib/inflate.o ../lib/zlib/inftrees.o
> ../lib/zlib/trees.o ../lib/zlib/uncompr.o ../lib/zlib/zutil.o
> ../lib/bzip2/blocksort.o ../lib/bzip2/huffman.o ../lib/bzip2/crctable.o
> ../lib/bzip2/randtable.o ../lib/bzip2/compress.o ../lib/bzip2/decompress.o
> ../lib/bzip2/bzlib.o ../lib/storage.hh ../lib/khmer.hh
> ../lib/khmer_config.hh ../lib/ktable.hh ../lib/hashtable.hh
> ../lib/counting.hh -L../lib -o
> /home/heather/khmer-screed/python/khmer/_khmermodule.so
> > > > > > >
> > > > > > > > > make[1]: Leaving directory
> `/home/heather/khmer-screed/python&#39;
> > > > > > > > > nosetests -v -x
> > > > > > > > > make: nosetests: Command not found
> > > > > > > > > make: *** [test] Error 127
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > **************
> > > > > > > > > $ make all
> > > > > > > > > cd lib && \
> > > > > > > > > make CXX="g++" CXXFLAGS=" -Wall -O3 -fPIC" LIBS=""
> > > > > > >
> > > > > > > > > make[1]: Entering directory
> `/home/heather/khmer-screed/lib&#39;
> > > > > > > > > make[1]: Nothing to be done for `all&#39;.
> > > > > > > > > make[1]: Leaving directory
> `/home/heather/khmer-screed/lib&#39;
> > > > > > > > > cd python && \
> > > > > > > > > make DEFINE_KHMER_EXTRA_SANITY_CHECKS="" \
> > > > > > > > > CXX_DEBUG_FLAGS="" \
> > > > > > > > > CYTHON_ENABLED_BOOL="False"
> > > > > > >
> > > > > > > > > make[1]: Entering directory
> `/home/heather/khmer-screed/python&#39;
> > > > > > > > > python setup.py build_ext -i
> > > > > > > > > running build_ext
> > > > > > >
> > > > > > > > > make[1]: Leaving directory
> `/home/heather/khmer-screed/python&#39;
> > > > > > > > >
> > > > > > > > > ***********************
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > When trying to run one of the script:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > $ ./load-into-counting.py
> > > > > > > > > Traceback (most recent call last):
> > > > > > > > > File "./load-into-counting.py", line 16, in <module>
> > > > > > > > > from khmer.threading_args import add_threading_args
> > > > > > > > > ImportError: No module named threading_args
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Would you please give me an hint on what is going on here?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > > Huan
> > > > > > > > >
> > > > > > > > > On 01/24/13, Eric McDonald wrote:
> > > > > > > > > > Huan,
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > > Thank you for the nice bug report. I was able to
> reproduce the problem with the &#39;master&#39; (default) branch of the
> &#39;ged-lab/khmer&#39; repository. If this bug is preventing you from
> making progress, then I would recommend that you try our "beta tester"
> branch, known as &#39;bleeding-edge&#39;:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > git clone -b bleeding-edge
> http://github.com/ged-lab/khmer.git khmer-BETA
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > > The &#39;bleeding-edge&#39; branch contains rewritten
> FASTA and FASTQ parsers (among other things) and it fixes the problem you
> have reported. I get identical results for the raw FASTA and gzip&#39;d
> FASTA files using this development branch.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > However, since &#39;bleeding-edge&#39; is for beta
> testing, there is the possibility that you may find new bugs elsewhere in
> the code. Please feel free to file more nice bug reports if you use that
> branch and encounter other bugs.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Thanks!
> > > > > > > > > > Eric
> > > > > > > > > >
> > > > > > >
> > > > >
> > > > >
> > >
> > >
> > > > > > > > > > On Thu, Jan 24, 2013 at 3:35 PM, Huan Fan <test_k9.kh(
> http://test_k9.kh)(http://test_k9.kh)(http://test_k9.kh)(java_script:main.compose(&#39;new&#39;,
> &#39;t=hfan22 at wisc.edu <hfan22 at wisc.edu> <hfan22 at wisc.edu <hfan22 at wisc.edu>>
> <hfan22 at wisc.edu <hfan22 at wisc.edu> <hfan22 at wisc.edu <hfan22 at wisc.edu>>>>(java_script:main.compose()>
> wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Titus,
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > > > > > This is follow-up on the gzip file issues we discussed
> last month. So I&#39;ve been using "load-into-counting.py" and
> "abundance-dist.py" to calculate the abundance distribution of kmers in
> some gzip files. I found some of results weird and I suspect it might have
> to do with the files that are gziped. So I made a small test file and it
> did give different results on the original file and gziped one. The test
> files are attached and the command I ran and results are as follows:
> > > > > > > > > > >
> > > > > > > > > > > for test.fa with k=9
> > > > > > >
> > > > > > > > > > > $ ./load-into-counting.py -k 9 -N 4 -x 2e9 > > result:
> > > > > > > > > > > 0 0 0 0.0
> > > > > > > > >
> > > > > > > > > > > 1 928 928
> 0.993(tel:1%20928%20928%200.993)(tel:1%20928%20928%200.993)
> > > > > > > > > > > 2 7 935 1.0
> > > > > > > > > > >
> > > > > > > > > > > for test.fa.gz with k=9
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> > > > > > > > > > > $ ./load-into-counting.py -k 9 -N 4 -x 2e9
> test_k9_gz.kh(http://test_k9_gz.kh)(http://test_k9_gz.kh)(
> http://test_k9_gz.kh)(http://test_k9_gz.kh)(http://test_k9_gz.kh)
> test.fa.gz
> > > > > > > > > > >
> > > > > > > > > > > $ ./abundance-dist.py -s test_k9_gz.kh(
> http://test_k9_gz.kh)(http://test_k9_gz.kh)(http://test_k9_gz.kh)(
> http://test_k9_gz.kh)(http://test_k9_gz.kh) test.fa.gz test_k9_gz.hist
> > > > > > > > > > > result:
> > > > > > > > > > > 0 0 0 0.0
> > > > > > > > > > > 1 894 894 0.94 8
> > > > > > > > > > > 2 49 943 1.0
> > > > > > > > > > >
> > > > > > > > > > > According to the data itself, there should be 935
> unique 9-mers in total and seven 9-mers appeared twice, just as the result
> for test.fa.
> > > > > > > > > > >
> > > > > > > > > > > Any idea what is going on here?
> > > > > > > > > > >
> > > > > > > > > > > Thanks very much!
> > > > > > > > > > >
> > > > > > > > > > > Cheers,
> > > > > > > > > > > Huan
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On 12/30/12, "C. Titus Brown" wrote:
> > > > > > > > > > > > Excellent, glad to hear it!
> > > > > > > > > > > >
> > > > > > > > > > > > Yes, we need to invest in more of a documentation
> effort :)
> > > > > > > > > > > >
> > > > > > > > > > > > best,
> > > > > > > > > > > > --titus
> > > > > > > > > > > >
> > > > > > > > > > > > On Sun, Dec 30, 2012 at 05:12:35PM +0800, Huan Fan
> wrote:
> > > > > > > > > > > > > Hi Titus,
> > > > > > > > > > > > >
> > > > > > > > >
> > > > > > > > > > > > > This is embarrasing. Yes it does! Sorry I
> shouldn&#39;t have assumed that it doesn&#39;t, merely because the example
> is given in .fa
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks so much!
> > > > > > > > > > > > >
> > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > Huan
> > > > > > > > > > > > >
> > > > > > > > > > > > > On 12/30/12, "C. Titus Brown" wrote:
> > > > > > > > > > >
> > > > > > > > > > > > > > Hi Huan,< br />> > >
> > > > > > > > >
> > > > > > > > > > > > > > are you sure it *doesn&#39;t* read in gzipped
> files? It should. Which kind of
> > > > > > > > > > > > > > hash table are you using -- counting or bit?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > cheers,
> > > > > > > > > > > > > > --titus
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Sat, Dec 29, 2012 at 09:09:49PM +0800, Huan
> Fan wrote:
> > > > > > > > > > > > > > > Dear khmer developer(s),
> > > > > > > > > > > > > > >
> > > > > > > > >
> > > > > > > > > > > > > > > First of all thanks for those useful modules!
> I would like to use the function ht.consume_fasta in my pipeline(in python)
> however I work with really big files and they are always in gzip format. I
> am wondering whether it is possible to make ht.consume_fasta able to take
> gzip files? I don&#39;t know C so I tried to "graft" some relevant code to
> hashtable.cc but failed. It will be really appreciated if you can make this
> feature available.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks ahead and happy holidays!> > > >
> > > > > > > > > > > > > > > Cheers,
> > > > > > > > > > > > > > > Huan
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > --
> > > > > > > > >
> > > > > > >
> > > > > > >
> > > > >
> > > > >
> > >
> > >
> > > > > > > > > > > > > > C. Titus Brown, ctb at msu.edu <ctb at msu.edu> <
> ctb at msu.edu <ctb at msu.edu>> <ctb at msu.edu <ctb at msu.edu> <ctb at msu.edu <
> ctb at msu.edu>>> <ctb at msu.edu <ctb at msu.edu> <ctb at msu.edu <ctb at msu.edu>> <
> ctb at msu.edu <ctb at msu.edu> <ctb at msu.edu <ctb at msu.edu
> >>>>(java_script:main.compose()
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > C. Titus Brown, ctb at msu.edu <ctb at msu.edu> <
> ctb at msu.edu <ctb at msu.edu>> <ctb at msu.edu <ctb at msu.edu> <ctb at msu.edu <
> ctb at msu.edu>>> <ctb at msu.edu <ctb at msu.edu> <ctb at msu.edu <ctb at msu.edu>> <
> ctb at msu.edu <ctb at msu.edu> <ctb at msu.edu <ctb at msu.edu
> >>>>(java_script:main.compose()
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Eric McDonald
> > > > > > > > > > HPC/Cloud Software Engineer
> > > > > > > > > > for the Institute for Cyber-Enabled Research (iCER)
> > > > > > > > > > and the Laboratory for Genomics, Evolution, and
> Development (GED)
> > > > > > > > > > Michigan State University
> > > > > > >
> > >
> > > > > > > > > > P: 517-355-8733(tel:517-355-8733)(tel:517-355-8733(tel:
> 517-355-8733))(tel:517-355-8733(tel:517-355-8733)(tel:517-355-8733(tel:
> 517-355-8733)))
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Eric McDonald
> > > > > > > > HPC/Cloud Software Engineer
> > > > > > > > for the Institute for Cyber-Enabled Research (iCER)
> > > > > > > > and the Laboratory for Genomics, Evolution, and Development
> (GED)
> > > > > > > > Michigan State University
> > > > > > > > P: 517-355-8733(tel:517-355-8733)(tel:517-355-8733(tel:
> 517-355-8733))
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Eric McDonald
> > > > > > HPC/Cloud Software Engineer
> > > > > > for the Institute for Cyber-Enabled Research (iCER)
> > > > > > and the Laboratory for Genomics, Evolution, and Development (GED)
> > > > > > Michigan State University
> > > > > > P:
> 517-355-8733(tel:517-355-8733)(tel:517-355-8733(tel:517-355-8733))
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Eric McDonald
> > > > HPC/Cloud Software Engineer
> > > > for the Institute for Cyber-Enabled Research (iCER)
> > > > and the Laboratory for Genomics, Evolution, and Development (GED)
> > > > Michigan State University
> > > > P: 517-355-8733(tel:517-355-8733)
> > >
> > >
> > >
> >
> >
> >
> >
> >
> > --
> > Eric McDonald
> > HPC/Cloud Software Engineer
> > for the Institute for Cyber-Enabled Research (iCER)
> > and the Laboratory for Genomics, Evolution, and Development (GED)
> > Michigan State University
> > P: 517-355-8733
>



-- 
Eric McDonald
HPC/Cloud Software Engineer
  for the Institute for Cyber-Enabled Research (iCER)
  and the Laboratory for Genomics, Evolution, and Development (GED)
Michigan State University
P: 517-355-8733
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20130314/14ba6fab/attachment-0002.htm>


More information about the khmer mailing list