[khmer] parallelizing reading

C. Titus Brown ctb at msu.edu
Mon Jul 29 05:10:45 PDT 2013


On Mon, Jul 29, 2013 at 10:57:06AM +0200, Peio Ziarsolo wrote:
> I have seen that ReadParser can be parallelized and I would like to use  
> it to  parallelize the reading of sequence files
>
> I am trying to use it but I don't know how. I have made a small script  
> to test the function:
>
> from khmer import ReadParser
> for i in ReadParser('/home/peio/work_in/bug_parallel/big.fastq', 2):
>     print i.name
>
> But I am not able to make it finish. If I use just one thread it  
> finishes as it should.
>
> What am I doing wrong? I am using bleeding-edge branch.
>
> Thanks in advance
> Peio Ziarsolo

Hi Peio,

there is some example code referenced in here --

http://ivory.idyll.org/blog/multithreaded-read-parsing-in-khmer.html

that should work.  Just grab the code from here,

https://gist.github.com/ctb/5328016

and go to town.  Briefly, you need to manage your own threading in Python,
but when you do, ReadParser will support it.

Note that Python itself does not run Python code in parallel because
of the global interpreter lock, so unless you are doing your computing on the
sequences in C or C++ and you release the lock, you won't gain anything
by using ReadParser.

Here are two discussions of this:

http://pyprocessing.berlios.de/doc/intro.html
http://jessenoller.com/blog/2009/02/01/python-threads-and-the-global-interpreter-lock

cheers,
--titus
-- 
C. Titus Brown, ctb at msu.edu




More information about the khmer mailing list