[bip] Indexing big sequence databases

C. Titus Brown ctb at msu.edu
Mon Mar 29 09:15:36 PDT 2010


On Mon, Mar 29, 2010 at 05:12:40PM +0100, James Casbon wrote:
> On 29 March 2010 16:45, Paul Davis <paul.joseph.davis at gmail.com> wrote:
> > You might also be interested in testing Tokyo Cabinet if your queries
> > are limited to "fetch by name" and "iterate over everything." Its
> > treated me pretty well but I've never gone out of my way to benchmark
> > it against other solutions as it was always fast enough.
> 
> I never got round to looking into it, but tokyo cabinet actually uses
> BWT to index, right?
> (See http://linux.die.net/man/3/tokyocabinet)
> 
> This means it should be the perfect data store to create a short read
> aligner, right?

...if you were indexing by sequence rather than by sequence name, right?

Good tip, though!  Might be something to use for in-record compression
by screed.

--titus
-- 
C. Titus Brown, ctb at msu.edu



More information about the biology-in-python mailing list