[protocols] Eel-pond/annotation questions

C. Titus Brown ctb at msu.edu
Thu Oct 2 04:18:50 PDT 2014


On Tue, Sep 30, 2014 at 03:25:31PM -0500, Jessica Perry Hekman wrote:
> Titus -- thanks for your answers! They were very helpful. Onward:

;)

> On 9/29/14 6:02 AM, C. Titus Brown wrote:
>
>> The older BLAST is functionally equivalent to the later BLAST, and I'm
>> pretty sure all of our code works for the older BLAST.  If and when we
>> update we'll have to put some effort into validation.  So... short answer
>> is that being conservative isn't always bad ;).
>
> Installing the older BLAST was pretty straightforward (phew). Finding it  
> wasn't immediately obvious, but I did dig it up here (URL provided in  
> case you want to add it to the documentation):
>
> ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release/LATEST/blast-2.2.26-x64-linux.tar.gz

Yep -- in the docs, here:

http://khmer-protocols.readthedocs.org/en/latest/mrnaseq/installing-blastkit.html

> I modified the commands to formatdb to build both databases as  
> nucleotide not protein, since (as mentioned in my previous email) I'm  
> using dog transcriptome instead of mouse proteome:
>
> /usr/local/khmer/blast*/bin/formatdb -i mouse.protein.faa -o T -p F
> /usr/local/khmer/blast*/bin/formatdb -i hyp-trans-gg-grouped.fasta -o T -p F
>
> (So -p F instead of -p T in the first command.)
>
> However, when I then ran blastall, it exited immediately without an  
> error message:
>
> /usr/local/khmer/blast-2.2.26/bin/blastall -i mouse.protein.faa -d  
> ~/transcriptome/hypothalamus/references/hyp-trans-gg.fasta -e 1e-3 -p  
> blastx -o dog.x.gg -a 8 -v 4 -b 4
>
> Some debugging determined that it does this silent exit when it can't  
> find the database to query (i.e. if I give it "-d foo" it behaves the  
> same way). Moreover, when I build the mouse.protein.faa database a) with  
> formatdb -p T or b) without specifying -p at all, then the subsequent  
> blastall command DOES run.
>
> Building formatdb with no -p option seems like an acceptable workaround  
> but I'm a little concerned that "formatdb -p F" doesn't work! Maybe this  
> is just a blastall bug and the workaround is fine, though.
>
> Your insights welcomed, but if you have none, then hopefully this is  
> still useful as documentation for the next person to try this approach.  
> When blastall finishes running we'll see if the output is as expected...

The problem here is that all of the commands expect a protein database, and
BLAST treats DNA and protein databases quite differently.  (I'm also not sure
the cutoffs or the scripts I've provided will work with DNA databases.)
The easiest thing to do might be to grab the dog proteome, which should
be available alongside the transcriptome somewhere -- at least, UCSC
should make it available...

best,
--titus



More information about the protocols mailing list