<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">

</head>

<body bgcolor="#ffffff" text="#000000">

Hi All,<br>

<br>

Last night we had a biology birds of a feather meeting as part of SciPy

2007. I have included notes below from both Diane and myself. To

summarize, there were two general trends during the evening:<br>

<ol>

  <li>Need to establish python/biology community, via website,

biology-in-python mailing list, rss, blogs, etc.</li>

  <li>Having a core set of "interfaces" for handling basic

bioinformatics objects would allow independent projects to share these

basic objects. I am sure others will describe this better and in more

detail in the near future.</li>

</ol>

I have agreed to setup the python/biology community site. There are

some ideas in the notes below and I will also be posting ideas and

requesting ideas for this in a future post. <br>

<br>

Enthought has agreed to host our community site. We have the option of

using scipy.org sub-domain such as bio.scipy.org or we can choose a

domain name like biologyinpython.org. Any thoughts or preferences on

which one we should use?<br>

<br>

I will let others jump and provide some of the details/interests from

the meeting in more detail.<br>

<br>

-Brandon King<br>

<br>

<br>

-------------------- Brandon King's Notes -----------------------------<br>

Birds of a Feather: Biology<br>

---------------------------<br>

<br>

Chris: We could use some core package where all Biology Python packages

can<br>

build off of, but still do there own thing. This would allow for the

packages to<br>

pass data around in a compatible way.<br>

<br>

Share [complex] functionality.<br>

&nbsp; * graph db/pygr<br>

&nbsp;&nbsp;&nbsp; * common interface<br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; * sequence<br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; * sequence DB<br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; * alignment (--&gt; annotation)<br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; * (BioPython seq_io)<br>

<br>

Parsing (Only need one / format).<br>

<br>

Large analysis management / parallel / cluster processing.<br>

&nbsp; * map / reduce impl?<br>

&nbsp;&nbsp;&nbsp; l = [ x, y, ... ]<br>

&nbsp;&nbsp;&nbsp; map(fn, l)<br>

&nbsp;&nbsp;&nbsp; reduce( l )<br>

&nbsp; * Parallelization in Python... mailing list.<br>

<br>

Other people's databases.<br>

<br>

<br>

Problems with BioPython:<br>

&nbsp;1) big, sprawling, interconnected.<br>

&nbsp;2) poor ... ???<br>

<br>

Parsing Issues:<br>

---------------<br>

<br>

&nbsp;* Blast<br>

&nbsp;* Hmmer<br>

&nbsp;<br>

&nbsp;Where is the community?<br>

&nbsp;&nbsp; * mailing list<br>

&nbsp;&nbsp; * wiki / website<br>

&nbsp;&nbsp; * RSS / blog / planet<br>

&nbsp;&nbsp;&nbsp;&nbsp; * extract?<br>

&nbsp;&nbsp;&nbsp;&nbsp; * use SciPy<br>

&nbsp;&nbsp; * "Don't suck." / easy_install<br>

&nbsp;&nbsp;&nbsp;&nbsp; * If you are interested in post datasets.<br>

&nbsp;&nbsp; * Coding standards<br>

&nbsp;&nbsp;&nbsp;&nbsp; 1) testing (<br>

&nbsp;&nbsp;&nbsp;&nbsp; 2) testing buildbot<br>

&nbsp;&nbsp;&nbsp;&nbsp; 3) PEP 8 compliance<br>

&nbsp;&nbsp;&nbsp;&nbsp; 4)<br>

<br>

<br>

Tutorials / entry documentation:<br>

--------------------------------<br>

&nbsp;<br>

&nbsp;* Redoing analysis.<br>

&nbsp;* How to distribute/write/host small projects (eggs)<br>

&nbsp;<br>

&nbsp;<br>

<br>

Common Theme:<br>

-------------<br>

<br>

Core interface so programs can play well together, while the

implementations<br>

can change. This allows the interface to be independent from the

storage.<br>

<br>

<br>

Databases:<br>

&nbsp;* NCBI eutils, etc.<br>

&nbsp;* Gene ontology<br>

&nbsp;* mammalian PO<br>

&nbsp;* UCSC/Ensembl<br>

&nbsp;* Integr8<br>

&nbsp;* Textspresso?<br>

&nbsp;<br>

<br>

Agreements:<br>

-----------<br>

&nbsp;* Brandon has agreed to setup the biology-in-python community website,

etc.<br>

<br>

<br>

<br>

---------------------- Diane Trout's Notes

--------------------------------<br>

<pre wrap="">* Introductions...

  * Industry, 2

  * Academic, 10

  * Unknown, 1

* What should we do?

  * Work on a common software

  * Work on a common api, or at least define a common api

* Sharing complex functionality

  * Graph Database

  * Sequence Databse, common API

    * common interface to the standard bioinformatics types

      * Like sequence

* parsing (only need once per format)

  * BLAST

  * HMMER

  * 

  * Biopython too monolithic

* Large Analysis Management Parallel/Cluster processing

  * Map/Reduce impl

* Other peoples databases

  * NCBI Eutils

  * Gene Ontology

  * mammalian phenotype ontology

  * UCSC/ENSEMBL

  * integr8

  * raw textpresso database available (lexicons)

* Missing Data

* Microarray Formats

  * R-BioConductor

* Problems with BioPython

  * Big, Sprawling, Interconnected

  * Poor Automated Testing

  * unpythonic

  * seems low-hanging fruit

* Python software

* Where is the community

  * Mailing List  

  * Wiki

  * Rss/Planet/Blog/planet 

    * bioinformatics.org

    * use scipy

  * Inclusivity

  * how to distribute/write/share small projects

  * "Dont Suck"

    * Coding standards

      * testing

      * PEP8 compliance &amp; docstrings 

      * setup.py distutils

      * make sure they're easy installable

    * if you want to publish your scripts &amp; data, we will be willing

      to help you host it

  * Tutorials

    * Entry documentation

    * Good thing in BioPython

      * Intro to how to use their blast parser

      * Cookbook

      * How to do the analysis of the paper in python

* One person argues that we shouldn't split things into too many fragments</pre>

<br>

</body>

</html>