[pygr-notify] Issue 45 in pygr: BlastDB has gotten slow due to cache

codesite-noreply at google.com codesite-noreply at google.com
Tue Oct 14 16:23:54 PDT 2008


Issue 45: BlastDB has gotten slow due to cache
http://code.google.com/p/pygr/issues/detail?id=45

New issue report by deepreds:
What steps will reproduce the problem?
1. Create FASTA file with 50 million sequences in it

outfile = open('R1', 'w')
for icount in range(1, 50000000):
     outfile.write('>' + str(icount) + '\n')
     outfile.write('ACGT\n')
outfile.close()

2. Open that FASTA (requires BlastDB building too)

from pygr import seqdb
R1 = seqdb.BlastDB('R1')

What is the expected output? What do you see instead?

Openning R1 should be fast without preloading sequence IDs. But, currently
BlastDB loads every sequence IDs into memory and it takes several minutes
to just open BlastDB. And that affects performance of NLMSA.

Please use labels and text to provide additional information.

1. Version as of August 13.

>>> seqdb.BlastDB('R1')
{}

Less than 1 sec. Returns empty dict.

2. Version as of Today.

>>> seqdb.BlastDB('R1')
<BlastDBbase 'R1'>

Took several minutes and load all indice into memory.



Issue attributes:
	Status: New
	Owner: deepreds
	Labels: Type-Enhancement Priority-Critical

-- 
You received this message because you are listed in the owner
or CC fields of this issue, or because you starred this issue.
You may adjust your issue notification preferences at:
http://code.google.com/hosting/settings



More information about the pygr-notify mailing list