[pygr-notify] [pygr commit] r247 - pygr.Data -> worldbase, except for historic entries

Fri Jun 26 14:23:48 PDT 2009

Author: marecki
Date: Fri Jun 26 14:22:33 2009
New Revision: 247

Added:
    wiki/ServingDataUsingWorldbase.wiki   (contents, props changed)
       - copied, changed from r246, /wiki/ServingDataUsingpygrData.wiki
    wiki/worldbaseIntroduction.wiki   (contents, props changed)
       - copied, changed from r246, /wiki/pygrDataIntroduction.wiki
Removed:
    wiki/ServingDataUsingpygrData.wiki
    wiki/pygrDataIntroduction.wiki
Modified:
    wiki/CodeExamples.wiki
    wiki/DataStorageUsingpygr.wiki
    wiki/MegatestSetup.wiki
    wiki/NlmsaFromAxtPairwise.wiki
    wiki/PygrDocumentation.wiki
    wiki/PygrOnEnsembl.wiki
    wiki/PygrResourceDownloader.wiki
    wiki/QuickOverview.wiki

Log:
pygr.Data -> worldbase, except for historic entries



Modified: wiki/CodeExamples.wiki
==============================================================================

--- wiki/CodeExamples.wiki	(original)
+++ wiki/CodeExamples.wiki	Fri Jun 26 14:22:33 2009
@@ -1,4 +1,4 @@
-#summary Code snippets demonstrating how to accomplish various tasks with  
Pygr.
+#summary Code snippets demonstrating how to accomplish various tasks with  
Pygr.

  = Introduction =

@@ -10,10 +10,10 @@
  The following code examples are presently available from the Pygr Wiki:
    * DataStorageUsingpygr
    * GenomeCalculationsUsingpygr
+  * [NlmsaFromAxtPairwise]
    * [LocatingIntergenicRegionsWithinAGenome]
-  * [pygrDataIntroduction]
    * PygrResourceDownloader
    * SearchingforPatterns
-  * ServingDataUsingpygrData
+  * ServingDataUsingWorldbase
    * [SimpleAnnotationDB]
-  * [NlmsaFromAxtPairwise]
\ No newline at end of file
+  * [worldbaseIntroduction]

Modified: wiki/DataStorageUsingpygr.wiki
==============================================================================
--- wiki/DataStorageUsingpygr.wiki	(original)
+++ wiki/DataStorageUsingpygr.wiki	Fri Jun 26 14:22:33 2009
@@ -1,8 +1,8 @@
-#summary Storing data in a MySQL table and pygr.Data
+#summary Storing data in a MySQL table and worldbase

  = Introduction =

-This article is an in-depth explanation of a script in which a genome and  
the accompanying annotations are manipulated in multiple ways,including in  
a MySQL table, and then stored in pygr.Data, a resource database. Storing  
the data this way enables it to be easily manipulated using pygr and  
prevents potential errors by allowing ease of access to the necessary  
genomic information.
+This article is an in-depth explanation of a script in which a genome and  
the accompanying annotations are manipulated in multiple ways,including in  
a MySQL table, and then stored in worldbase, a resource database. Storing  
the data this way enables it to be easily manipulated using pygr and  
prevents potential errors by allowing ease of access to the necessary  
genomic information.

  *WARNING*: This is a code-example Wiki page and as such _may_ be out of  
sync with current versions of Pygr. It will be removed or refactored once  
our doctest infrastructure has been deployed.

@@ -211,22 +211,22 @@
  annot_map.build()
  }}}

-Docstrings are then created for the genome, the annotations, and the  
annotation map so they may be stored in pygr.Data. pygr.Data requires  
docstrings to be assigned to every resources stored within, to allow a more  
descriptive storage of resources and to allow easier access.
+Docstrings are then created for the genome, the annotations, and the  
annotation map so they may be stored in worldbase. worldbase requires  
docstrings to be assigned to every resources stored within, to allow a more  
descriptive storage of resources and to allow easier access.

-Finally, the genome, the annotation, and the annotation map is stored in  
pygr.Data. Since the annotation map is a schema, its can be stored in  
pygr.Data as a schema. In order to store schema in pygr.Data, the  
relationship between the schema must be defined (Many-To-Many or  
One-To-Many). The annotation map is saved in pygr.Dara first, then again  
with the schema assignment. When saving the map as schema, the relationship  
between the schema and the resources it references must also be made clear,  
and the resources must be available in pygr.Data as well (you must save the  
genome and annotations along with the annotation map).
+Finally, the genome, the annotation, and the annotation map is stored in  
worldbase. Since the annotation map is a schema, its can be stored in  
worldbase as a schema. In order to store schema in worldbase, the  
relationship between the schema must be defined (Many-To-Many or  
One-To-Many). The annotation map is saved in pygr.Dara first, then again  
with the schema assignment. When saving the map as schema, the relationship  
between the schema and the resources it references must also be made clear,  
and the resources must be available in worldbase as well (you must save the  
genome and annotations along with the annotation map).

-bindAttr can have up to three attribute names, although only one is used  
here. 'annots' is bound to the objects of the source database (the  
annotations are keys for the annotation map). The pygr.Data resources are  
then stored to pygr.Data using the save() command, which is essential for  
any session that modifies or adds pygr.Data resources.
+bindAttr can have up to three attribute names, although only one is used  
here. 'annots' is bound to the objects of the source database (the  
annotations are keys for the annotation map). The worldbase resources are  
then stored to worldbase using the save() command, which is essential for  
any session that modifies or adds worldbase resources.

  {{{
  genome.__doc__ = 'ecoli genome'
  annots.__doc__ = 'ecoli annotations'
  annot_map.__doc__ = 'annotation map'

-pygr.Data.Bio.Seq.Genome.ecoli = genome
-pygr.Data.Bio.Annotation.ecoli.annotations = annots
-pygr.Data.Bio.Annotation.ecoli.annotationmap = annot_map
-pygr.Data.schema.Bio.Annotation.ecoli.annotationmap = \
-    pygr.Data.ManyToManyRelation(genome,annots,bindAttrs=('annots',))
+worldbase.Bio.Seq.Genome.ecoli = genome
+worldbase.Bio.Annotation.ecoli.annotations = annots
+worldbase.Bio.Annotation.ecoli.annotationmap = annot_map
+worldbase.schema.Bio.Annotation.ecoli.annotationmap = \
+    worldbase.ManyToManyRelation(genome,annots,bindAttrs=('annots',))

-pygr.Data.save()
+worldbase.save()
  }}}

Modified: wiki/MegatestSetup.wiki
==============================================================================
--- wiki/MegatestSetup.wiki	(original)
+++ wiki/MegatestSetup.wiki	Fri Jun 26 14:22:33 2009
@@ -23,7 +23,7 @@

   * [http://somethingaboutorange.com/mrl/projects/nose/ Nose] (megatests  
haven't been rewritten for the new test framework yet);

- * _(optional)_ A local pygr.Data XML-RPC server, so that the  
data-download test is not affected by the quality of your connection to the  
UCLA one;
+ * _(optional)_ A local worldbase XML-RPC server, so that the  
data-download test is not affected by the quality of your connection to the  
UCLA one;

   * Sequence data, miscellaneous input and reference output used by  
megatests; obtaining and installing these will be described below.


Modified: wiki/NlmsaFromAxtPairwise.wiki
==============================================================================
--- wiki/NlmsaFromAxtPairwise.wiki	(original)
+++ wiki/NlmsaFromAxtPairwise.wiki	Fri Jun 26 14:22:33 2009
@@ -75,13 +75,13 @@
  cnestedlist.NLMSA(pathstem=pathstem, mode='w', seqDict=genomeUnion,  
axtFiles=axtlist, maxlen=536870912, maxint=22369620)
  }}}

-If you are planning to save NLMSA into pygr.Data and never open directly  
from file, you don't have to give additional options. For example:
+If you are planning to save NLMSA into worldbase and never open directly  
from file, you don't have to give additional options. For example:

  {{{
-import pygr.Data
+from pygr import worldbase
  msa.__doc__ = "5-way alignment using axt pairwise files"
-pygr.Data.Bio.Alignment.HUMAN.hg18.hg18_pairwise5way = msa
-pygr.Data.save()
+worldbase.Bio.Alignment.HUMAN.hg18.hg18_pairwise5way = msa
+worldbase.save()
  }}}

  However, if you are planning to open NLMSA directly from file, the seqDict  
should be saved into file by explicitly:

Modified: wiki/PygrDocumentation.wiki
==============================================================================
--- wiki/PygrDocumentation.wiki	(original)
+++ wiki/PygrDocumentation.wiki	Fri Jun 26 14:22:33 2009
@@ -38,7 +38,7 @@
    * [http://bioinfo.mbi.ucla.edu/pygr/docs/ Pygr Versions, Publications,  
Presentations]

  = Talks =
-  * [http://video.google.com/videoplay?docid=1813952225455171972 Pygr and  
Pygr.Data talk] at UCLA Bioinformatics retreat  
([http://www.doe-mbi.ucla.edu/~leec/talks/UCLA%20Bioinfo08.pdf slides in  
PDF]): May 2008
+  * [http://video.google.com/videoplay?docid=1813952225455171972 Pygr and  
pygr.Data talk] at UCLA Bioinformatics retreat  
([http://www.doe-mbi.ucla.edu/~leec/talks/UCLA%20Bioinfo08.pdf slides in  
PDF]): May 2008
    * [http://bioinfo.mbi.ucla.edu/pygr/docs/SciPy07Lee.pdf SciPy 2007  
presentation]: August 2007.
    * [http://bioinfo.mbi.ucla.edu/pygr/docs/ISMB2006_PYGR_PPT.pdf ISMB 2006  
Software Demo]: includes working examples from pygr tutorials.
    * [http://bioinfo.mbi.ucla.edu/pygr/docs/pygr2005.pdf ISMB 2005  
tutorial]: a quick intro to the goals of the Pygr project; this was  
followed by running various code examples, more or less following the  
tutorial examples in the docs.

Modified: wiki/PygrOnEnsembl.wiki
==============================================================================
--- wiki/PygrOnEnsembl.wiki	(original)
+++ wiki/PygrOnEnsembl.wiki	Fri Jun 26 14:22:33 2009
@@ -81,7 +81,7 @@

  *-* specialized adaptor classes (subclasses of Pygr's sqlgraph.SQLTable  
class): provides access to a specific sql table in an ensembl core database

-*-* private module methods: provide automatic saving of the Ensembl  
database schema to pygr.Data
+*-* private module methods: provide automatic saving of the Ensembl  
database schema to worldbase

  *3.* the featuremapping module (featuremapping.py): provides mapping  
between ensembl features


Modified: wiki/PygrResourceDownloader.wiki
==============================================================================
--- wiki/PygrResourceDownloader.wiki	(original)
+++ wiki/PygrResourceDownloader.wiki	Fri Jun 26 14:22:33 2009
@@ -3,42 +3,42 @@
  *WARNING*: This is a code-example Wiki page and as such _may_ be out of  
sync with current versions of Pygr. It will be removed or refactored once  
our doctest infrastructure has been deployed.


-One can easily download pre-built pygr.Data resources into your localdisk.  
Be sure to give writable path before XMLRPC server ('.' in PYGRDATAPATH).
+One can easily download pre-built worldbase resources into your localdisk.  
Be sure to give writable path before XMLRPC server ('.' in WORLDBASEPATH).

  {{{
    import os
-  os.environ['PYGRDATAPATH']  
= '.,http://biodb2.bioinformatics.ucla.edu:5000'
-  import pygr.Data
+  os.environ['WORLDBASEPATH']  
= '.,http://biodb2.bioinformatics.ucla.edu:5000'
+  from pygr import worldbase

-  pygr.Data.dir('') # RETURNS ALL XMLRPC RESOURCES
-  pygr.Data.dir('', download=True) # RETURNS ALL DOWNLOADABLE RESOURCES
+  worldbase.dir('') # RETURNS ALL XMLRPC RESOURCES
+  worldbase.dir('', download=True) # RETURNS ALL DOWNLOADABLE RESOURCES
  }}}

-For seqdb.BlastDB, you have to setup PYGRDATADOWNLOAD path.
+For seqdb.BlastDB, you have to setup WORLDBASEDOWNLOAD path.

  {{{
-  os.environ['PYGRDATADOWNLOAD'] = '/my/seqdb/path'
+  os.environ['WORLDBASEDOWNLOAD'] = '/my/seqdb/path'

-  hg18 = pygr.Data.Bio.Seq.Genome.HUMAN.hg18(download=True)
+  hg18 = worldbase.Bio.Seq.Genome.HUMAN.hg18(download=True)
  }}}

-Above line will initiate downloading and saving hg18 into your  
PYGRDATADOWNLOAD path.
+Above line will initiate downloading and saving hg18 into your  
WORLDBASEDOWNLOAD path.

-For NLMSA, you have to setup PYGRDATABUILDDIR path..
+For NLMSA, you have to setup WORLDBASEBUILDDIR path..

  {{{
-  os.environ['PYGRDATABUILDDIR'] = '/my/nlmsa/path'
+  os.environ['WORLDBASEBUILDDIR'] = '/my/nlmsa/path'

-  hg18_multiz28way = pygr.Data.Bio.MSA.UCSC.hg18_multiz28way(download=True)
+  hg18_multiz28way = worldbase.Bio.MSA.UCSC.hg18_multiz28way(download=True)
  }}}

-Above line will initiate downloading and saving hg18_multiz28way into your  
PYGRDATABUILDDIR path.
+Above line will initiate downloading and saving hg18_multiz28way into your  
WORLDBASEBUILDDIR path.

  If you don't have huge disk space, don't forget to delete intermediate  
compressed files and text files.

  Of course, if you delete download=True option, it will access biodb2  
XMLRPC resources.

  {{{
-  hg18 = pygr.Data.Bio.Seq.Genome.HUMAN.hg18()
-  hg18_multiz28way = pygr.Data.Bio.MSA.UCSC.hg18_multiz28way()
+  hg18 = worldbase.Bio.Seq.Genome.HUMAN.hg18()
+  hg18_multiz28way = worldbase.Bio.MSA.UCSC.hg18_multiz28way()
  }}}

Modified: wiki/QuickOverview.wiki
==============================================================================
--- wiki/QuickOverview.wiki	(original)
+++ wiki/QuickOverview.wiki	Fri Jun 26 14:22:33 2009
@@ -18,7 +18,7 @@
  == Optional, Recommended ==
  While pygr's core functionality only requires a sane python environment,  
some specific features require additional software:
  	
-  * MySQL support: allows Pygr to access MySQL databases using its  
pygr.sqlgraph module.  Also needed for pygr.Data module support for storage  
of pygr.Data resource databases in MySQL.  Requirements: *MySQL-python  
(MySQLdb module) >= 1.2.0; works with any server MySQL >= 3.23.x*
+  * MySQL support: allows Pygr to access MySQL databases using its  
pygr.sqlgraph module.  Also needed for worldbase module support for storage  
of worldbase resource databases in MySQL.  Requirements: *MySQL-python  
(MySQLdb module) >= 1.2.0; works with any server MySQL >= 3.23.x*

    * NCBI tools: used by the pygr.seqdb.BlastDB class to provide convenient  
blast/megablast search.  Requirements: *formatdb, blastall, megablast*, any  
recent version which you can  
[http://www.ncbi.nlm.nih.gov/IEB/ToolBox/index.cgi download from NCBI];  
executables must be in your $PATH.


Copied: wiki/ServingDataUsingWorldbase.wiki (from r246,  
/wiki/ServingDataUsingpygrData.wiki)
==============================================================================
--- /wiki/ServingDataUsingpygrData.wiki	(original)
+++ wiki/ServingDataUsingWorldbase.wiki	Fri Jun 26 14:22:33 2009
@@ -1,23 +1,23 @@
-#summary Creating an XML-RPC server though pygr.Data
+#summary Creating an XML-RPC server though worldbase

  = Introduction =

-Using pygr.Data to store your resources is especially convenient when  
attempting to access them remotely through a server, as the unique handles  
assigned to the data when registered in pygr.Data ensure ease of access.  
The server used here is an XML-RPC server, a server that encodes the data  
using XML (Extensible Markup Language) and then HTTP as the data transport  
method. Creating an XML-RPC server is very simple, and will allow the user  
to retrieve databases stored in pygr.Data, even from independent computers.
+Using worldbase to store your resources is especially convenient when  
attempting to access them remotely through a server, as the unique handles  
assigned to the data when registered in worldbase ensure ease of access.  
The server used here is an XML-RPC server, a server that encodes the data  
using XML (Extensible Markup Language) and then HTTP as the data transport  
method. Creating an XML-RPC server is very simple, and will allow the user  
to retrieve databases stored in worldbase, even from independent computers.

  *WARNING*: This is a code-example Wiki page and as such _may_ be out of  
sync with current versions of Pygr. It will be removed or refactored once  
our doctest infrastructure has been deployed.


  = A Helpful Example =

-First, import pygr.Data, then reference the pygr.Data resource you wish to  
serve. In this case, the reference is pygr.Data.Bio.Seq.Genome.ECOLI.ecoli.  
A NLMSA is a data structure used to store the genome/sequence maps. The  
alignment and sequence databases stored in the NLMSA can currently be  
accessed by pygr.Data.
+Import worldbase from pygr, then reference the worldbase resource you wish  
to serve. In this case, the reference is  
worldbase.Bio.Seq.Genome.ECOLI.ecoli. A NLMSA is a data structure used to  
store the genome/sequence maps. The alignment and sequence databases stored  
in the NLMSA can currently be accessed by worldbase.

-Next, the server is assigned a name; this name will be used a layer name  
within pygr.Data, as well as a port number. The port number can be set to  
any number that is currently available. Finally, the server can be accessed  
easily by the URL from any location, as long as the URL is set to the  
PYGRDATAPATH. The default PYGRDATAPATH is  
http://biodb2.bioinformatics.ucla.edu:5000, and thus if this remains  
unchanged, the user will not be able to add or delete resources to/from  
pygr.Data. Furthermore, the server must be assigned a name (like 'rachel')  
that will also be used as the layer name for the pygr.Data resource when  
attempting to access it remotely.
+Next, the server is assigned a name; this name will be used a layer name  
within worldbase, as well as a port number. The port number can be set to  
any number that is currently available. Finally, the server can be accessed  
easily by the URL from any location, as long as the URL is set to the  
WORLDBASEPATH. The default WORLDBASEPATH is  
http://biodb2.bioinformatics.ucla.edu:5000, and thus if this remains  
unchanged, the user will not be able to add or delete resources to/from  
worldbase. Furthermore, the server must be assigned a name (like 'rachel')  
that will also be used as the layer name for the worldbase resource when  
attempting to access it remotely.

-In order to access the newly-created server from a remote location, the  
server must be set as the PYGRDATAPATH. PYGRDATAPATH searches for pygr.Data  
resources in three steps: 1) in the current directory; 2) in the home  
directory; and 3) from the XMLRPC server. It is essential to assign the  
server as to PYGRDATAPATH, or an error will result. The correct address to  
give to PYGRDATAPATH would be the URL of your server  
(http://somehost:1215), with somehost as the server address. Firewalls may  
be present, and could potentially prevent access to the XML-RPC server, and  
thus should be addressed as need be.
+In order to access the newly-created server from a remote location, the  
server must be set as the WORLDBASEPATH. WORLDBASEPATH searches for  
worldbase resources in three steps: 1) in the current directory; 2) in the  
home directory; and 3) from the XMLRPC server. It is essential to assign  
the server as to WORLDBASEPATH, or an error will result. The correct  
address to give to WORLDBASEPATH would be the URL of your server  
(http://somehost:1215), with somehost as the server address. Firewalls may  
be present, and could potentially prevent access to the XML-RPC server, and  
thus should be addressed as need be.

  {{{
-import pygr.Data
-nlmsa = pygr.Data.Bio.Seq.Genome.ECOLI.ecoli()
-server = pygr.Data.getResource.newServer('rachel', withIndex=True,  
port=1215)
+from pygr import worldbase
+nlmsa = worldbase.Bio.Seq.Genome.ECOLI.ecoli()
+server = worldbase.getResource.newServer('rachel', withIndex=True,  
port=1215)
  server.serve_forever()
  }}}

Copied: wiki/worldbaseIntroduction.wiki (from r246,  
/wiki/pygrDataIntroduction.wiki)
==============================================================================
--- /wiki/pygrDataIntroduction.wiki	(original)
+++ wiki/worldbaseIntroduction.wiki	Fri Jun 26 14:22:33 2009
@@ -1,8 +1,8 @@
-#summary A step-by-step example for adding data to pygr.Data
+#summary A step-by-step example for adding data to worldbase

  = Introduction =

-This tutorial introduces pygr.Data, which allows for easy access to  
multiple datasets by providing a consistent namespace or context for data.  
This method of data retrieval enables users to manipulate large quantities  
of data, potentially on multiple machines, without the added worry of  
ensuring each computer can directly access the various filepaths.  However,  
it should be noted that pygr.Data is intended for higher-level data  
resources, such as a MySQL table, BLAST sequence database, or a Python  
dictionary or shelve, because pygr.Data is purposed to be a “database of  
databases” rather than a substitute for a database.
+This tutorial introduces worldbase, which allows for easy access to  
multiple datasets by providing a consistent namespace or context for data.  
This method of data retrieval enables users to manipulate large quantities  
of data, potentially on multiple machines, without the added worry of  
ensuring each computer can directly access the various filepaths.  However,  
it should be noted that worldbase is intended for higher-level data  
resources, such as a MySQL table, BLAST sequence database, or a Python  
dictionary or shelve, because worldbase is purposed to be a “database of  
databases” rather than a substitute for a database.

  *WARNING*: This is a code-example Wiki page and as such _may_ be out of  
sync with current versions of Pygr. It will be removed or refactored once  
our doctest infrastructure has been deployed.

@@ -11,23 +11,21 @@

  The E. coli genome sequence is stored in a BLAST database using seqdb.  
BLAST (Basic Local Alignment Search Tool) databases are designed for  
storing sequence alignments.

-pygr.Data is then imported to allow access to the data namespace. This is  
an essential step, as pygr.Data must be previously imported in order to  
store or access data from or in it. PYGRPDATAPATH must be set to the  
directory in which it is located.
+worldbase is then imported to allow access to the data namespace. This is  
an essential step, as worldbase must be previously imported in order to  
store or access data from or in it. WORLDBASEPATH must be set to the  
directory in which it is located.
  In the following step, the data is stored in a container. There are many  
options for this, including a MySQL table or a BLAST database as seen here.

-Furthermore, assigning a __doc__ string is extremely important, as the  
data MUST have a __doc__ string, which describes the kind of data it is, so  
that when a user looks at a directory listing of pygr.Data, he/she can  
quickly ascertain what data is stored. A __doc__string (documentation  
string) allows users to easily associate documentation with functions,  
classes, and modules, which is especially convenient for pygr.Data, since  
many databases could potentially be stored in it, and documentation ensures  
clarity and unambiguity.
+Furthermore, assigning a __doc__ string is extremely important, as the  
data MUST have a __doc__ string, which describes the kind of data it is, so  
that when a user looks at a directory listing of worldbase, he/she can  
quickly ascertain what data is stored. A __doc__string (documentation  
string) allows users to easily associate documentation with functions,  
classes, and modules, which is especially convenient for worldbase, since  
many databases could potentially be stored in it, and documentation ensures  
clarity and unambiguity.

-Finally, the data is stored in pygr.Data using the save() function. In all  
pygr.Data sessions, it is essential to call the pygr.Data.save() function  
to ensure all new data that has been added that session is committed.  
Furthermore, it is imperative to observe the naming conventions for saving  
data to pygr.Data, since not only does it assign a unique and consistent  
name to the data, ensuring its easy import, but also since multiple users  
could be using one pygr.Data database and the data should be clearly  
organized.
+Finally, the data is stored in worldbase using the save() function. In all  
worldbase sessions, it is essential to call the worldbase.save() function  
to ensure all new data that has been added that session is committed.  
Furthermore, it is imperative to observe the naming conventions for saving  
data to worldbase, since not only does it assign a unique and consistent  
name to the data, ensuring its easy import, but also since multiple users  
could be using one worldbase database and the data should be clearly  
organized.

  {{{
-from pygr import seqdb
-
-import pygr.Data
+from pygr import seqdb, worldbase

  ecoli = seqdb.BlastDB('/home/mccreary/Projects/pygr/data/CP000802.fna')

  ecoli.__doc__ = 'ecoli genome sequence'

-pygr.Data.Bio.Seq.Genome.ECOLI.ecoli = ecoli
+worldbase.Bio.Seq.Genome.ECOLI.ecoli = ecoli

-pygr.Data.save()
+worldbase.save()
  }}}