[bip] Pygr questions

Diane Trout diane at caltech.edu
Thu Oct 4 10:37:03 PDT 2007


On Thu, Oct 04, 2007 at 09:55:40AM -0700, Christopher Lee wrote:
> Hi Diane,
> a few answers:
>
> - regarding the "orientation" attribute.  This is actually optional; if you 
> want you can simply use negative (start,stop) coordinates to specify 
> reverse orientation.

Ok, I was a bit confused about the proper values for negative coordinates which is why I stuck with the familiar (start,stop,orientation). 

Imagining a 1000 bp sequence is a (start, stop, orientation) of (800, 900, -1) == (-200, -100) AKA [start-len(seq):stop-len(seq)]? It might've been useful to have an illustrative example showing how the various coordinate systems match up in the pygr reference docs.


> s = nlmsa.seqDict.prefixDict[prefix][id]
> You can then slice s any way you want and use it or its slices as a query 
> to the NLMSA.

I wonder if making a slice that way is faster than constructing the string prefix+"."+id?

> It occurs to me we could implement a __hash__ for sequence databases such 
> that any pair of sequence database objects that actually are derived from 
> the same file would be treated as the "same database" for NLMSA.seqDict and 
> other purposes...  In this case you could open the same database separately 
> and it would work for querying the NLMSA... but only if it was exactly the 
> same filepath (which seems fragile, given the prevalence of automount these 
> days, constructing arbitrary path prefixes).  Does that seem worthwhile?

That would've simplified my fumbling around but if some of the introductory tutorials illustrated seqDict I would've used it.

> - currently NLMSA is read-only.  Once built, you can't add more data to it. 
>  We could change this behavior (forcing a rebuild after new data was 
> added), or we could implement a truly dynamic version of NLMSA (using tree 
> structures rather than sorted arrays).

A possibly simpler, but still useful, solution would be some way of combining some NLMSAs together into a new NLMSA. 

Diane 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 186 bytes
Desc: not available
Url : http://lists.idyll.org/pipermail/biology-in-python/attachments/20071004/17f6cba8/attachment.pgp 


More information about the biology-in-python mailing list