[bip] Indexing big sequence databases
C. Titus Brown
ctb at msu.edu
Mon Mar 29 09:15:36 PDT 2010
On Mon, Mar 29, 2010 at 05:12:40PM +0100, James Casbon wrote:
> On 29 March 2010 16:45, Paul Davis <paul.joseph.davis at gmail.com> wrote:
> > You might also be interested in testing Tokyo Cabinet if your queries
> > are limited to "fetch by name" and "iterate over everything." Its
> > treated me pretty well but I've never gone out of my way to benchmark
> > it against other solutions as it was always fast enough.
>
> I never got round to looking into it, but tokyo cabinet actually uses
> BWT to index, right?
> (See http://linux.die.net/man/3/tokyocabinet)
>
> This means it should be the perfect data store to create a short read
> aligner, right?
...if you were indexing by sequence rather than by sequence name, right?
Good tip, though! Might be something to use for in-record compression
by screed.
--titus
--
C. Titus Brown, ctb at msu.edu
More information about the biology-in-python
mailing list