[khmer] parallelizing reading

C. Titus Brown ctb at msu.edu
Mon Jul 29 05:19:31 PDT 2013


On Mon, Jul 29, 2013 at 05:10:45AM -0700, C. Titus Brown wrote:
> On Mon, Jul 29, 2013 at 10:57:06AM +0200, Peio Ziarsolo wrote:
> > I have seen that ReadParser can be parallelized and I would like to use  
> > it to  parallelize the reading of sequence files
> >
> > I am trying to use it but I don't know how. I have made a small script  
> > to test the function:
> >
> > from khmer import ReadParser
> > for i in ReadParser('/home/peio/work_in/bug_parallel/big.fastq', 2):
> >     print i.name
> >
> > But I am not able to make it finish. If I use just one thread it  
> > finishes as it should.
> >
> > What am I doing wrong? I am using bleeding-edge branch.
> >
> > Thanks in advance
> > Peio Ziarsolo
> 
> Hi Peio,
> 
> there is some example code referenced in here --
> 
> http://ivory.idyll.org/blog/multithreaded-read-parsing-in-khmer.html
> 
> that should work.  Just grab the code from here,
> 
> https://gist.github.com/ctb/5328016
> 
> and go to town.  Briefly, you need to manage your own threading in Python,
> but when you do, ReadParser will support it.

p.s. You can also look at many of the scripts distributed with khmer,
e.g. load-into-counting, but they're more complicated than that test
script.

https://github.com/ged-lab/khmer/blob/bleeding-edge/scripts/load-into-counting.py

cheers,
--titus
-- 
C. Titus Brown, ctb at msu.edu




More information about the khmer mailing list