[bip] SciPy 2007 - Birds of a Feather - Biology

Nathan Harmston iwanttobeabadger at googlemail.com
Fri Aug 17 16:17:46 PDT 2007


   1. Need to establish python/biology community, via website,
   biology-in-python mailing list, rss, blogs, etc.

That would be a great idea, it think it would be great to be able to post
code/solutions and get it reviewed. Peer-review of tools could be a great
way of finding bugs in code and better ways of doing things that are been
done.

Having a core set of "interfaces" for handling basic bioinformatics objects
would allow independent projects to share these basic objects. I am sure
others will describe this better and in more detail in the near future.
I think this would be a great idea...I think I already mentioned it in a
previous post. If we were to define a core small set of objects, Intervals,
Sequences, Features, Microarrays, Alignments etc.......which larger projects
can be built around, this would make it much easier to integrate python
projects and increase code reuse across different projects.


I have agreed to setup the python/biology community site. There are some
> ideas in the notes below and I will also be posting ideas and requesting
> ideas for this in a future post.
>

How about a trac/wiki with some kind of dpaste sort of functionality?

Chris: We could use some core package where all Biology Python packages can
> build off of, but still do there own thing. This would allow for the
> packages to
> pass data around in a compatible way.
>


Share [complex] functionality.
>   * graph db/pygr
>     * common interface
>       * sequence
>       * sequence DB
>       * alignment (--> annotation)
>       * (BioPython seq_io)
>
> Parsing (Only need one / format).
>
> Large analysis management / parallel / cluster processing.
>   * map / reduce impl?
>     l = [ x, y, ... ]
>     map(fn, l)
>     reduce( l )
>   * Parallelization in Python... mailing list.
>

I believe Peter in a previous post mentioned IPython1, which might match the
needs for cluster processing.


> Other people's databases.
>

How about defining something using sqlalchemy, I know people dont use
BioSQL, but maybe a BioSQL/sqlalchemy api or how about writing a
PyEnsembl..would this be a good way to start, might even convince some Perl
people to convert.

Problems with BioPython:
>  1) big, sprawling, interconnected.
>  2) poor ... ???
>

If a core set of modules were implemented with a defined api then we wouldnt
have the big sprawling problem.

Poor what????


>
>  Where is the community?
>    * mailing list
>

              This list is a start surely?

   * wiki / website
>    * RSS / blog / planet
>      * extract?
>      * use SciPy
>    * "Don't suck." / easy_install
>      * If you are interested in post datasets.
>    * Coding standards
>      1) testing (
>      2) testing buildbot
>



     3) PEP 8 compliance
>                 Definitely
>
>     * Coding standards
>
>       * testing
>       * PEP8 compliance & docstrings
>       * setup.py distutils
>       * make sure they're easy installable
>
>     * if you want to publish your scripts & data, we will be willing
>       to help you host it
>
>
I think testing should also be able to be hosted.......I think this could be
a good way of "proving" that  the results are "ok" (notice the quotes).

Documentation of any core set of modules needs to be very good.

The main problem I foresee with creating a core set of modules is convincing
people with toolkits already using there own "core modules", so adaptors
might be needed for toolkits already published to convert them to the core
modules ideas.

I would propose a simple starting point/hierarchy based around intervals
(any comments?).

                                            Interval
                                                |
                   ----------------------------------------------
                    |                          |
|
            Feature                   Probe                  Sequence
                   |                    (List of probes      (Fasta esque)
           -----------------       makes an array)           |
           |                   |
-----------------------
        Gene           Exon                             |
         |
                                                       Circular Sequence
SequenceWithQuality

(Ring/plasmid)            (Fastq esque)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.idyll.org/pipermail/biology-in-python/attachments/20070818/042f2fb3/attachment.htm 


More information about the biology-in-python mailing list