[bip] Indexing big sequence databases

Paul Davis paul.joseph.davis at gmail.com
Mon Mar 29 08:45:14 PDT 2010


On Sun, Mar 28, 2010 at 10:46 PM, C. Titus Brown <ctb at msu.edu> wrote:
> Hi all,
>
> with reference to this earlier thread in November about indexing FASTA
> and FASTQ files,
>
>  http://lists.idyll.org/pipermail/biology-in-python/2009-November/000499.html
>
> I posted an update:
>
>  http://ivory.idyll.org/blog/mar-10/storing-and-retrieving-sequences.html
>
> Basically, taking James Casbon's advice, we've switched to using sqlite as our
> backend for the dirty work of storing sequences.
>
> Comments & random thoughts welcome, as always.
>
> cheers,
> --titus
> --
> C. Titus Brown, ctb at msu.edu
>
> _______________________________________________
> biology-in-python mailing list - bip at lists.idyll.org.
>
> See http://bio.scipy.org/ for our Wiki.
>

Titus,

You might also be interested in testing Tokyo Cabinet if your queries
are limited to "fetch by name" and "iterate over everything." Its
treated me pretty well but I've never gone out of my way to benchmark
it against other solutions as it was always fast enough.

Paul Davis



More information about the biology-in-python mailing list