[khmer] gzip files

Eric McDonald emcd.msu at gmail.com
Wed Mar 13 15:25:50 PDT 2013


Hi Huan,

Interesting... the path for the old khmer must be getting placed in front
of the that for the new one with the Python interpreter's path list.

Please let us know the results of:
  PYTHONPATH="$HOME/screed:$HOME/khmer-BETA/python" python -c "import sys,
pprint; pprint.pprint( sys.path )"
  PYTHONPATH="$HOME/screed:$HOME/khmer-BETA/python" python -S -c "import
sys, pprint; pprint.pprint( sys.path )"

Thanks,
  Eric


On Wed, Mar 13, 2013 at 5:27 PM, Huan Fan <hfan22 at wisc.edu> wrote:

> Hi Eric,
>
> I reset the PYTHONPATH but nothing changed:
>
> heather at chc2-desktop:~/khmer-BETA/scripts$ export
> PYTHONPATH="$HOME/screed:$HOME/khmer-BETA/python"
>
> heather at chc2-desktop:~/khmer-BETA/scripts$ echo $PYTHONPATH
> /home/heather/screed:/home/heather/khmer-BETA/python
>
> heather at chc2-desktop:~/khmer-BETA/scripts$ python -c "import khmer; print
> khmer.__file__"
>
> /usr/local/lib/python2.7/dist-packages/khmer-0.4-py2.7-linux-x86_64.egg/khmer/__init__.pyc
>
> heather at chc2-desktop:~/khmer-BETA/scripts$ test -r
> ~/khmer-BETA/python/khmer/threading_args.py; echo $?
> 0
>
> heather at chc2-desktop:~/khmer-BETA/scripts$ ./load-into-counting.py
> Traceback (most recent call last):
>  File "./load-into-counting.py", line 17, in <module>
>  from khmer.threading_args import add_threading_args
> ImportError: No module named threading_args
>
>
>
> On 03/13/13, Eric McDonald  wrote:
> > Hi Huan,
> >
> > Thank you for the information. The problem is as I suspected - you are
> not using the correct &#39;khmer&#39;. Details inline below:
> >
> >
> > On Wed, Mar 13, 2013 at 1:08 PM, Huan Fan <hfan22 at wisc.edu <
> hfan22 at wisc.edu')" target="1">hfan22 at wisc.edu> wrote:
> >
> > >
> >
> >
> >
> > > heather at chc2-desktop:~/khmer-BETA/scripts$ ./load-into-counting.py
> > > Traceback (most recent call last):
> > >
> > > File "./load-into-counting.py", line 17, in <module>
> > > from khmer.threading_args import add_threading_args
> > > ImportError: No module named threading_args
> > >
> > >
> > > heather at chc2-desktop:~/khmer-BETA/scripts$ python -c "import khmer;
> print khmer.__file__"
> > >
> /usr/local/lib/python2.7/dist-packages/khmer-0.4-py2.7-linux-x86_64.egg/khmer/__init__.pyc
> > >
> > >
> >
> >
> > This is the wrong &#39;khmer&#39;. The correct output should be:
> > /home/heather/khmer-BETA/python/khmer/__init__.pyc
> > and that will only happen if you set the PYTHONPATH correctly.
> >
> >
> > > heather at chc2-desktop:~/khmer-BETA/scripts$ test -r
> python/khmer/threading_args.py; echo $?
> > > 1
> > >
> >
> > Sorry, that command should be:
> > test -r ~/khmer-BETA/python/khmer/threading_args.py; echo $?
> >
> >
> > > heather at chc2-desktop:~/khmer-BETA/scripts$ echo $PYTHONPATH
> > > /home/heather/screed
> > >
> >
> >
> > You also need the path to your &#39;khmer&#39; in PYTHONPATH. Please try:
> > export PYTHONPATH="$HOME/screed:$HOME/khmer-BETA/python"
> > echo $PYTHONPATH
> > and then:
> > ./load-into-counting.py
> >
> >
> > Hope this helps,
> > Eric
> >
> >
> >
> > >
> > > On 03/12/13, Eric McDonald wrote:
> > >
> > > > Hi Huan,
> > > >
> > > > (This is a follow-up on a conversation we had in January about the
> &#39;khmer&#39; software. I tried replying to you back then, but the
> mailing list was having some problems. I suppose you may have moved on to
> other things by now, but if you would still like help using the software, I
> can try to help. Original reply is below....)
> > > >
> > > >
> > > >
> > > >
> > > > Thanks for the report. I cannot reproduce the problem.
> > > >
> > > >
> > > >
> > > >
> > >
> > > > Let&#39;s verify that Python is looking at the correct khmer. Can
> you please report the output of:
> > > >
> > > >
> > > > python -c "import khmer; print khmer.__file__"
> > > >
> > > >
> > > > Also, can you report the output of:
> > > >
> > > >
> > > > test -r python/khmer/threading_args.py; echo $?
> > > >
> > > >
> > > > And, for good measure, what is the output of:
> > > >
> > > > echo $PYTHONPATH
> > > > echo $PATH
> > > >
> > > >
> > > >
> > > >
> > > > Regarding your comment about screed, I agree that setting the
> PYTHONPATH for it is annoying and easy to forget. (I forget about this
> frequently.) In the longer term, we are hoping to remove or directly
> incorporate the screed dependency in khmer.
> > > >
> > > >
> > >
> > > > One thing which may help you is the &#39;virtualenv&#39; package. If
> you have &#39;virtualenv&#39; installed, then you should be able to do the
> following steps:
> > > >
> > > >
> > > > virtualenv --no-site-packages PYTHON-ENV
> > > > . PYTHON-ENV/bin/activate
> > > > (cd screed && python setup.py install)
> > > > (cd khmer && make -j4 && cd python && python setup.py install)
> > > >
> > > >
> > > > Once you have done that, then, in the future, you will only need to
> do:
> > > >
> > > >
> > > > . PYTHON-ENV/bin/activate
> > > >
> > > >
> > > >
> > > >
> > > > Thanks,
> > > > Eric
> > > >
> > > >
> > > >
> > >
> > > > On Sat, Jan 26, 2013 at 1:05 PM, Huan Fan <setup.py.in(javascript:main.compose('new',
> 't=hfan22 at wisc.edu> <hfan22 at wisc.edu <hfan22 at wisc.edu>&#39;)" target="1">
> hfan22 at wisc.edu <hfan22 at wisc.edu>> wrote:
> > > >
> > > > > Hi Eric,
> > > > >
> > > > > Thanks very much for looking into this!
> > > > >
> > > > >
> > > > > I tried to install the bleeding-edge version but failed. I suspect
> it might have sth to do with screed. So I installed screed for the khmer
> version that I was using last Aug and forgot what I did to tell khmer where
> screed is. What I did this time is:
> > > > >
> > > > >
> > >
> > > > > export PYTHONPATH=&#39;/home/heather/screed&#39;
> > > > >
> > > > >
> > > > > The make test error is like:
> > > > >
> > > > >
> > > > > **************
> > > > > $ make test
> > > > > cd lib && \
> > > > > make CXX="g++" CXXFLAGS=" -Wall -O3 -fPIC" LIBS=""
> > >
> > > > > make[1]: Entering directory `/home/heather/khmer-screed/lib&#39;
> > > > > (cd zlib && ./configure --shared && make libz.so.1.2.3)
> > > > > Checking for gcc...
> > > > > Checking for shared library support...
> > > > > Building shared library libz.so.1.2.3 with gcc.
> > > > > Checking for unistd.h... Yes.
> > > > > Checking whether to use vs[n]printf() or s[n]printf()... using
> vs[n]printf()
> > > > > Checking for vsnprintf() in stdio.h... Yes.
> > > > > Checking for return value of vsnprintf()... Yes.
> > > > > Checking for errno.h... Yes.
> > > > > Checking for mmap support... Yes.
> > >
> > > > > make[2]: Entering directory
> `/home/heather/khmer-screed/lib/zlib&#39;
> > > > > gcc -O3 -fPIC -DUSE_MMAP -c -o adler32.o adler32.c
> > > > > gcc -O3 -fPIC -DUSE_MMAP -c -o compress.o compress.c
> > > > > gcc -O3 -fPIC -DUSE_MMAP -c -o crc32.o crc32.c
> > > > > gcc -O3 -fPIC -DUSE_MMAP -c -o gzio.o gzio.c
> > > > > gcc -O3 -fPIC -DUSE_MMAP -c -o uncompr.o uncompr.c
> > > > > gcc -O3 -fPIC -DUSE_MMAP -c -o deflate.o deflate.c
> > > > > gcc -O3 -fPIC -DUSE_MMAP -c -o trees.o trees.c
> > > > > gcc -O3 -fPIC -DUSE_MMAP -c -o zutil.o zutil.c
> > > > > gcc -O3 -fPIC -DUSE_MMAP -c -o inflate.o inflate.c
> > > > > gcc -O3 -fPIC -DUSE_MMAP -c -o infback.o infback.c
> > > > > gcc -O3 -fPIC -DUSE_MMAP -c -o inftrees.o inftrees.c
> > > > > gcc -O3 -fPIC -DUSE_MMAP -c -o inffast.o inffast.c
> > > > > #gcc -shared -Wl,-soname,libz.so.1 -o libz.so.1.2.3 adler32.o
> compress.o crc32.o gzio.o uncompr.o deflate.o trees.o zutil.o inflate.o
> infback.o inftrees.o inffast.o
> > > > > #rm -f libz.so libz.so.1
> > > > > #ln -s libz.so.1.2.3 libz.so
> > > > > #ln -s libz.so.1.2.3 libz.so.1
> > >
> > > > > make[2]: Leaving directory
> `/home/heather/khmer-screed/lib/zlib&#39;
> > > > > (cd bzip2 && make -f Makefile-libbz2_so all)
> > >
> > > > > make[2]: Entering directory
> `/home/heather/khmer-screed/lib/bzip2&#39;
> > > > > gcc -fpic -fPIC -Wall -Winline -O2 -g -D_FILE_OFFSET_BITS=64 -c
> blocksort.c
> > > > > gcc -fpic -fPIC -Wall -Winline -O2 -g -D_FILE_OFFSET_BITS=64 -c
> huffman.c
> > > > > gcc -fpic -fPIC -Wall -Winline -O2 -g -D_FILE_OFFSET_BITS=64 -c
> crctable.c
> > > > > gcc -fpic -fPIC -Wall -Winline -O2 -g -D_FILE_OFFSET_BITS=64 -c
> randtable.c
> > > > > gcc -fpic -fPIC -Wall -Winline -O2 -g -D_FILE_OFFSET_BITS=64 -c
> compress.c
> > > > > gcc -fpic -fPIC -Wall -Winline -O2 -g -D_FILE_OFFSET_BITS=64 -c
> decompress.c
> > > > > gcc -fpic -fPIC -Wall -Winline -O2 -g -D_FILE_OFFSET_BITS=64 -c
> bzlib.c
> > > > > #gcc -shared -Wl,-soname -Wl,libbz2.so.1.0 -o libbz2.so.1.0.6
> blocksort.o huffman.o crctable.o randtable.o compress.o decompress.o bzlib.o
> > > > > #gcc -fpic -fPIC -Wall -Winline -O2 -g -D_FILE_OFFSET_BITS=64 -o
> bzip2-shared bzip2.c libbz2.so.1.0.6
> > > > > #rm -f libbz2.so.1.0
> > > > > #ln -s libbz2.so.1.0.6 libbz2.so.1.0
> > >
> > > > > make[2]: Leaving directory
> `/home/heather/khmer-screed/lib/bzip2&#39;
> > > > > g++ -Wall -O3 -fPIC -c -o khmer_config.o khmer_config.cc
> > > > > g++ -Wall -O3 -fPIC -c -o thread_id_map.o thread_id_map.cc
> > > > > g++ -Wall -O3 -fPIC -c -o trace_logger.o trace_logger.cc
> > > > > g++ -Wall -O3 -fPIC -c -o perf_metrics.o perf_metrics.cc
> > > > > g++ -Wall -O3 -fPIC -c -o ktable.o ktable.cc
> > > > > g++ -Wall -O3 -fPIC -c -o parsers.o parsers.cc
> > > > > g++ -Wall -O3 -fPIC -c -o read_parsers.o read_parsers.cc
> > > > > g++ -Wall -O3 -fPIC -c -o hashtable.o hashtable.cc
> > > > > g++ -Wall -O3 -fPIC -c -o hashbits.o hashbits.cc
> > > > > g++ -Wall -O3 -fPIC -c -o subset.o subset.cc
> > > > > g++ -Wall -O3 -fPIC -c -o counting.o counting.cc
> > > > > g++ -Wall -O3 -fPIC -c -o bittest.o bittest.cc
> > > > > g++ -o bittest bittest.o ktable.o
> > > > > g++ -Wall -O3 -fPIC -c -o ktable_test.o ktable_test.cc
> > > > > ktable_test.cc: In function ‘int main()’:
> > > > > ktable_test.cc:40:14: warning: deprecated conversion from string
> constant to ‘char*’
> > > > > g++ -o ktable_test ktable_test.o hashtable.o parsers.o
> read_parsers.o khmer_config.o thread_id_map.o trace_logger.o perf_metrics.o
> ktable.o zlib/adler32.o zlib/compress.o zlib/crc32.o zlib/gzio.o
> zlib/uncompr.o zlib/deflate.o zlib/trees.o zlib/zutil.o zlib/inflate.o
> zlib/infback.o zlib/inftrees.o zlib/inffast.o bzip2/blocksort.o
> bzip2/huffman.o bzip2/crctable.o bzip2/randtable.o bzip2/compress.o
> bzip2/decompress.o bzip2/bzlib.o
> > > > > g++ -Wall -O3 -fPIC -c -o test-StreamReader.o test-StreamReader.cc
> > > > > g++ -o test-StreamReader test-StreamReader.o read_parsers.o
> khmer_config.o thread_id_map.o trace_logger.o perf_metrics.o ktable.o
> zlib/adler32.o zlib/compress.o zlib/crc32.o zlib/gzio.o zlib/uncompr.o
> zlib/deflate.o zlib/trees.o zlib/zutil.o zlib/inflate.o zlib/infback.o
> zlib/inftrees.o zlib/inffast.o bzip2/blocksort.o bzip2/huffman.o
> bzip2/crctable.o bzip2/randtable.o bzip2/compress.o bzip2/decompress.o
> bzip2/bzlib.o
> > > > > g++ -Wall -O3 -fPIC -c -o test-CacheManager.o test-CacheManager.cc
> -fopenmp
> > > > > test-CacheManager.cc: In function ‘int main(int, char**)’:
> > > > > test-CacheManager.cc:106:11: warning: unused variable
> ‘segment_cut_pos’
> > > > > g++ -o test-CacheManager test-CacheManager.o read_parsers.o
> khmer_config.o thread_id_map.o trace_logger.o perf_metrics.o ktable.o
> zlib/adler32.o zlib/compress.o zlib/crc32.o zlib/gzio.o zlib/uncompr.o
> zlib/deflate.o zlib/trees.o zlib/zutil.o zlib/inflate.o zlib/infback.o
> zlib/inftrees.o zlib/inffast.o bzip2/blocksort.o bzip2/huffman.o
> bzip2/crctable.o bzip2/randtable.o bzip2/compress.o bzip2/decompress.o
> bzip2/bzlib.o -fopenmp
> > > > > g++ -Wall -O3 -fPIC -c -o test-Parser.o test-Parser.cc -fopenmp
> > > > > test-Parser.cc: In function ‘int main(int, char**)’:
> > > > > test-Parser.cc:73:11: warning: unused variable ‘seq_len’
> > > > > test-Parser.cc:74:37: warning: unused variable ‘ofile_name’
> > > > > test-Parser.cc:75:10: warning: unused variable ‘ofile_handle’
> > > > > g++ -o test-Parser test-Parser.o read_parsers.o khmer_config.o
> thread_id_map.o trace_logger.o perf_metrics.o ktable.o zlib/adler32.o
> zlib/compress.o zlib/crc32.o zlib/gzio.o zlib/uncompr.o zlib/deflate.o
> zlib/trees.o zlib/zutil.o zlib/inflate.o zlib/infback.o zlib/inftrees.o
> zlib/inffast.o bzip2/blocksort.o bzip2/huffman.o bzip2/crctable.o
> bzip2/randtable.o bzip2/compress.o bzip2/decompress.o bzip2/bzlib.o -fopenmp
> > > > > g++ -Wall -O3 -fPIC -c -o test-HashTables.o test-HashTables.cc
> -fopenmp
> > > > > g++ -o test-HashTables test-HashTables.o counting.o hashbits.o
> hashtable.o subset.o parsers.o read_parsers.o khmer_config.o
> thread_id_map.o trace_logger.o perf_metrics.o ktable.o zlib/adler32.o
> zlib/compress.o zlib/crc32.o zlib/gzio.o zlib/uncompr.o zlib/deflate.o
> zlib/trees.o zlib/zutil.o zlib/inflate.o zlib/infback.o zlib/inftrees.o
> zlib/inffast.o bzip2/blocksort.o bzip2/huffman.o bzip2/crctable.o
> bzip2/randtable.o bzip2/compress.o bzip2/decompress.o bzip2/bzlib.o -fopenmp
> > > > > g++ -Wall -O3 -fPIC -c -o ht-diff.o ht-diff.cc
> > > > > g++ -o ht-diff ht-diff.o counting.o hashtable.o parsers.o
> read_parsers.o khmer_config.o thread_id_map.o trace_logger.o perf_metrics.o
> ktable.o zlib/adler32.o zlib/compress.o zlib/crc32.o zlib/gzio.o
> zlib/uncompr.o zlib/deflate.o zlib/trees.o zlib/zutil.o zlib/inflate.o
> zlib/infback.o zlib/inftrees.o zlib/inffast.o bzip2/blocksort.o
> bzip2/huffman.o bzip2/crctable.o bzip2/randtable.o bzip2/compress.o
> bzip2/decompress.o bzip2/bzlib.o
> > >
> > >
> > > > > make[1]: Leaving directory `/home/heather/khmer-screed/lib&#39;
> > > > > cd python && \
> > > > > make DEFINE_KHMER_EXTRA_SANITY_CHECKS="" \
> > > > > CXX_DEBUG_FLAGS="" \
> > > > > CYTHON_ENABLED_BOOL="False"
> > >
> > > > > make[1]: Entering directory `/home/heather/khmer-screed/python&#39;
> > > > > sed \
> > > > > -e &#39;s/@DEFINE_KHMER_EXTRA_SANITY_CHECKS@//g&#39; \
> > > > > -e &#39;s/@CXX_DEBUG_FLAGS@//g&#39; \
> > > > > -e &#39;s/@CYTHON_ENABLED_BOOL@/False/g&#39; \
> > >
> > > > > > > python setup.py build_ext -i
> > > > > running build_ext
> > >
> > > > > building &#39;khmer._khmermodule&#39; extension
> > > > > creating build
> > > > > creating build/temp.linux-x86_64-2.7
> > > > > gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall
> -Wstrict-prototypes -fPIC -I../lib -I/usr/include/python2.7 -c
> _khmermodule.cc -o build/temp.linux-x86_64-2.7/_khmermodule.o
> > > > > cc1plus: warning: command line option "-Wstrict-prototypes" is
> valid for Ada/C/ObjC but not for C++
> > > > > In file included from /usr/include/python2.7/Python.h:8:0,
> > > > > from _khmermodule.cc:7:
> > > > > /usr/include/python2.7/pyconfig.h:1155:0: warning:
> "_POSIX_C_SOURCE" redefined
> > > > > /usr/include/features.h:163:0: note: this is the location of the
> previous definition
> > > > > /usr/include/python2.7/pyconfig.h:1177:0: warning: "_XOPEN_SOURCE"
> redefined
> > > > > /usr/include/features.h:165:0: note: this is the location of the
> previous definition
> > > > > g++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions
> -Wl,-Bsymbolic-functions build/temp.linux-x86_64-2.7/_khmermodule.o
> ../lib/khmer_config.o ../lib/thread_id_map.o ../lib/trace_logger.o
> ../lib/perf_metrics.o ../lib/read_parsers.o ../lib/ktable.o
> ../lib/hashtable.o ../lib/hashbits.o ../lib/counting.o ../lib/subset.o
> ../lib/zlib/adler32.o ../lib/zlib/compress.o ../lib/zlib/crc32.o
> ../lib/zlib/deflate.o ../lib/zlib/gzio.o ../lib/zlib/infback.o
> ../lib/zlib/inffast.o ../lib/zlib/inflate.o ../lib/zlib/inftrees.o
> ../lib/zlib/trees.o ../lib/zlib/uncompr.o ../lib/zlib/zutil.o
> ../lib/bzip2/blocksort.o ../lib/bzip2/huffman.o ../lib/bzip2/crctable.o
> ../lib/bzip2/randtable.o ../lib/bzip2/compress.o ../lib/bzip2/decompress.o
> ../lib/bzip2/bzlib.o ../lib/storage.hh ../lib/khmer.hh
> ../lib/khmer_config.hh ../lib/ktable.hh ../lib/hashtable.hh
> ../lib/counting.hh -L../lib -o
> /home/heather/khmer-screed/python/khmer/_khmermodule.so
> > >
> > > > > make[1]: Leaving directory `/home/heather/khmer-screed/python&#39;
> > > > > nosetests -v -x
> > > > > make: nosetests: Command not found
> > > > > make: *** [test] Error 127
> > > > >
> > > > >
> > > > >
> > > > > **************
> > > > > $ make all
> > > > > cd lib && \
> > > > > make CXX="g++" CXXFLAGS=" -Wall -O3 -fPIC" LIBS=""
> > >
> > > > > make[1]: Entering directory `/home/heather/khmer-screed/lib&#39;
> > > > > make[1]: Nothing to be done for `all&#39;.
> > > > > make[1]: Leaving directory `/home/heather/khmer-screed/lib&#39;
> > > > > cd python && \
> > > > > make DEFINE_KHMER_EXTRA_SANITY_CHECKS="" \
> > > > > CXX_DEBUG_FLAGS="" \
> > > > > CYTHON_ENABLED_BOOL="False"
> > >
> > > > > make[1]: Entering directory `/home/heather/khmer-screed/python&#39;
> > > > > python setup.py build_ext -i
> > > > > running build_ext
> > >
> > > > > make[1]: Leaving directory `/home/heather/khmer-screed/python&#39;
> > > > >
> > > > > ***********************
> > > > >
> > > > >
> > > > > When trying to run one of the script:
> > > > >
> > > > >
> > > > > $ ./load-into-counting.py
> > > > > Traceback (most recent call last):
> > > > > File "./load-into-counting.py", line 16, in <module>
> > > > > from khmer.threading_args import add_threading_args
> > > > > ImportError: No module named threading_args
> > > > >
> > > > >
> > > > > Would you please give me an hint on what is going on here?
> > > > >
> > > > >
> > > > > Cheers,
> > > > > Huan
> > > > >
> > > > > On 01/24/13, Eric McDonald wrote:
> > > > > > Huan,
> > > > > >
> > > > >
> > > > > > Thank you for the nice bug report. I was able to reproduce the
> problem with the &#39;master&#39; (default) branch of the
> &#39;ged-lab/khmer&#39; repository. If this bug is preventing you from
> making progress, then I would recommend that you try our "beta tester"
> branch, known as &#39;bleeding-edge&#39;:
> > > > > >
> > > > > >
> > > > > > git clone -b bleeding-edge http://github.com/ged-lab/khmer.gitkhmer-BETA
> > > > > >
> > > > > >
> > > > >
> > > > > > The &#39;bleeding-edge&#39; branch contains rewritten FASTA and
> FASTQ parsers (among other things) and it fixes the problem you have
> reported. I get identical results for the raw FASTA and gzip&#39;d FASTA
> files using this development branch.
> > > > > >
> > > > > >
> > > > > > However, since &#39;bleeding-edge&#39; is for beta testing,
> there is the possibility that you may find new bugs elsewhere in the code.
> Please feel free to file more nice bug reports if you use that branch and
> encounter other bugs.
> > > > > >
> > > > > >
> > > > > > Thanks!
> > > > > > Eric
> > > > > >
> > >
> > > > > > On Thu, Jan 24, 2013 at 3:35 PM, Huan Fan <test_k9.kh(
> http://test_k9.kh)(java_script:main.compose(&#39;new&#39;, &#39;t=
> hfan22 at wisc.edu <hfan22 at wisc.edu>>(java_script:main.compose()> wrote:
> > > > > >
> > > > > > > Hi Titus,
> > > > > > >
> > > > >
> > > > > > > This is follow-up on the gzip file issues we discussed last
> month. So I&#39;ve been using "load-into-counting.py" and
> "abundance-dist.py" to calculate the abundance distribution of kmers in
> some gzip files. I found some of results weird and I suspect it might have
> to do with the files that are gziped. So I made a small test file and it
> did give different results on the original file and gziped one. The test
> files are attached and the command I ran and results are as follows:
> > > > > > >
> > > > > > > for test.fa with k=9
> > >
> > > > > > > $ ./load-into-counting.py -k 9 -N 4 -x 2e9 > > result:
> > > > > > > 0 0 0 0.0
> > > > >
> > > > > > > 1 928 928
> 0.993(tel:1%20928%20928%200.993)(tel:1%20928%20928%200.993)
> > > > > > > 2 7 935 1.0
> > > > > > >
> > > > > > > for test.fa.gz with k=9
> > > > >
> > >
> > > > > > > $ ./load-into-counting.py -k 9 -N 4 -x 2e9 test_k9_gz.kh(
> http://test_k9_gz.kh)(http://test_k9_gz.kh)(http://test_k9_gz.kh)
> test.fa.gz
> > > > > > >
> > > > > > > $ ./abundance-dist.py -s test_k9_gz.kh(http://test_k9_gz.kh)(
> http://test_k9_gz.kh)(http://test_k9_gz.kh) test.fa.gz test_k9_gz.hist
> > > > > > > result:
> > > > > > > 0 0 0 0.0
> > > > > > > 1 894 894 0.94 8
> > > > > > > 2 49 943 1.0
> > > > > > >
> > > > > > > According to the data itself, there should be 935 unique
> 9-mers in total and seven 9-mers appeared twice, just as the result for
> test.fa.
> > > > > > >
> > > > > > > Any idea what is going on here?
> > > > > > >
> > > > > > > Thanks very much!
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Huan
> > > > > > >
> > > > > > >
> > > > > > > On 12/30/12, "C. Titus Brown" wrote:
> > > > > > > > Excellent, glad to hear it!
> > > > > > > >
> > > > > > > > Yes, we need to invest in more of a documentation effort :)
> > > > > > > >
> > > > > > > > best,
> > > > > > > > --titus
> > > > > > > >
> > > > > > > > On Sun, Dec 30, 2012 at 05:12:35PM +0800, Huan Fan wrote:
> > > > > > > > > Hi Titus,
> > > > > > > > >
> > > > >
> > > > > > > > > This is embarrasing. Yes it does! Sorry I shouldn&#39;t
> have assumed that it doesn&#39;t, merely because the example is given in .fa
> > > > > > > > >
> > > > > > > > > Thanks so much!
> > > > > > > > >
> > > > > > > > > Cheers,
> > > > > > > > > Huan
> > > > > > > > >
> > > > > > > > > On 12/30/12, "C. Titus Brown" wrote:
> > > > > > >
> > > > > > > > > > Hi Huan,< br />> > >
> > > > >
> > > > > > > > > > are you sure it *doesn&#39;t* read in gzipped files? It
> should. Which kind of
> > > > > > > > > > hash table are you using -- counting or bit?
> > > > > > > > > >
> > > > > > > > > > cheers,
> > > > > > > > > > --titus
> > > > > > > > > >
> > > > > > > > > > On Sat, Dec 29, 2012 at 09:09:49PM +0800, Huan Fan wrote:
> > > > > > > > > > > Dear khmer developer(s),
> > > > > > > > > > >
> > > > >
> > > > > > > > > > > First of all thanks for those useful modules! I would
> like to use the function ht.consume_fasta in my pipeline(in python) however
> I work with really big files and they are always in gzip format. I am
> wondering whether it is possible to make ht.consume_fasta able to take gzip
> files? I don&#39;t know C so I tried to "graft" some relevant code to
> hashtable.cc but failed. It will be really appreciated if you can make this
> feature available.
> > > > > > > > > > >
> > > > > > > > > > > Thanks ahead and happy holidays!> > > >
> > > > > > > > > > > Cheers,
> > > > > > > > > > > Huan
> > > > > > > > > >
> > > > > > > > > > --
> > > > >
> > >
> > >
> > > > > > > > > > C. Titus Brown, ctb at msu.edu <ctb at msu.edu> <ctb at msu.edu <
> ctb at msu.edu>>(java_script:main.compose()
> > > > > > > >
> > > > > > > > --
> > > > > > > > C. Titus Brown, ctb at msu.edu <ctb at msu.edu> <ctb at msu.edu <
> ctb at msu.edu>>(java_script:main.compose()
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Eric McDonald
> > > > > > HPC/Cloud Software Engineer
> > > > > > for the Institute for Cyber-Enabled Research (iCER)
> > > > > > and the Laboratory for Genomics, Evolution, and Development (GED)
> > > > > > Michigan State University
> > >
> > > > > > P: 517-355-8733(tel:517-355-8733)
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Eric McDonald
> > > > HPC/Cloud Software Engineer
> > > > for the Institute for Cyber-Enabled Research (iCER)
> > > > and the Laboratory for Genomics, Evolution, and Development (GED)
> > > > Michigan State University
> > > > P: 517-355-8733
> > >
> > >
> > >
> >
> >
> >
> >
> >
> > --
> > Eric McDonald
> > HPC/Cloud Software Engineer
> > for the Institute for Cyber-Enabled Research (iCER)
> > and the Laboratory for Genomics, Evolution, and Development (GED)
> > Michigan State University
> > P: 517-355-8733
>



-- 
Eric McDonald
HPC/Cloud Software Engineer
  for the Institute for Cyber-Enabled Research (iCER)
  and the Laboratory for Genomics, Evolution, and Development (GED)
Michigan State University
P: 517-355-8733
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.idyll.org/pipermail/khmer/attachments/20130313/8a64f73d/attachment-0002.htm>


More information about the khmer mailing list