[khmer] parallelizing reading
C. Titus Brown
ctb at msu.edu
Mon Jul 29 05:19:31 PDT 2013
On Mon, Jul 29, 2013 at 05:10:45AM -0700, C. Titus Brown wrote:
> On Mon, Jul 29, 2013 at 10:57:06AM +0200, Peio Ziarsolo wrote:
> > I have seen that ReadParser can be parallelized and I would like to use
> > it to parallelize the reading of sequence files
> >
> > I am trying to use it but I don't know how. I have made a small script
> > to test the function:
> >
> > from khmer import ReadParser
> > for i in ReadParser('/home/peio/work_in/bug_parallel/big.fastq', 2):
> > print i.name
> >
> > But I am not able to make it finish. If I use just one thread it
> > finishes as it should.
> >
> > What am I doing wrong? I am using bleeding-edge branch.
> >
> > Thanks in advance
> > Peio Ziarsolo
>
> Hi Peio,
>
> there is some example code referenced in here --
>
> http://ivory.idyll.org/blog/multithreaded-read-parsing-in-khmer.html
>
> that should work. Just grab the code from here,
>
> https://gist.github.com/ctb/5328016
>
> and go to town. Briefly, you need to manage your own threading in Python,
> but when you do, ReadParser will support it.
p.s. You can also look at many of the scripts distributed with khmer,
e.g. load-into-counting, but they're more complicated than that test
script.
https://github.com/ged-lab/khmer/blob/bleeding-edge/scripts/load-into-counting.py
cheers,
--titus
--
C. Titus Brown, ctb at msu.edu
More information about the khmer
mailing list