[bip] Future of bioinformatics in python..?

Wed Aug 8 10:03:02 PDT 2007

Hi,
The main problems I have with something like Sage include:
1) It is a monolithic project or at least a metaproject so ti depends
on the individual projects.
2) License issues especially linking to closed-source or propriety
packages like Matlab and the GPL v3 code. It is not clear what, where
and where different licenses can be used. Sage is GPL v2 so it can
only be used in similar projects.
3) Improvements in Python such as bug and performance fixes to new
syntax (list comprehension) or features (elementtree in Python 2.5)
are usually not supported for various reasons like support in
individual projects.
4) Projects changes or improvements: Numerical python is a good
example where Numeric, numarray and now numpy exist.
5) Problem of maintaining all the interfaces to other packages
(especially maintaining namespaces) in addition to maintaining those
packages.
6) Dependencies between and within individual projects that may even
conflict or have difficulties in building (like SciPy) across
platforms, OSes and distributions/versions.

All these are not really an issue while there is funding, personnel
and interest in all aspects of the project.

A different problem with something like Sage is that the syntax that
is different from the individual projects. This creates problems in
basic usage to support (eg, is the error in Sage or not?).

Bruce

On 8/8/07, Peter Clarke <resurgo at gmail.com> wrote:
> I agree completely that large monolithic projects don't work well.
> What I was suggesting was something quite opposite, similar to Sage
> (www.sagemath.org), which acts as a graphical/web front end, but with
> a lot of smaller packages behind it.
>
> See http://www.sagemath.org/packages.html for the packages that sage
> provides - even biopython is in the optional section.
>
> I was thinking more along the lines of a front end that supports
> graphical sequence/alignment  manipulation (bx/pygr backend),
> phylogenies (rpy/bioconductor/phylip/paup backend), network analysis
> (networkx/rpy..) and microarray data analysis (rpy/bioconductor..),
> sloppycell for biochemical network modelling and systems biology
> integration and including biopython, scipy, numpy etc, ipython and
> ipython1 (for clusters) provided as a single downloadable working
> system. A single graphical interface with the ability to call on
> powerful external packages..
>
> Modules for conversion between the different data formats could be
> written to integrate things. With further iterations common data
> standards could emerge out of the project that newer versions of the
> component packages could agree to support.
>
> I think there are enough of the people behind python in biology here
> to drive this. Given the will, and people being able to work together,
> it could lead to python becoming the dominant programming language for
> biology.
>
> Sage works well on the combination of web collaboration and sprints,
> and there are, by now, enough people who understand enough of both
> biology and programming to develop the APIs, data standards etc.
>
> Biology is progressing at a phenomenal rate, which is all the more
> reason to have a platform where people can quickly implement new
> methods using powerful mathematical/bioinformatic components and be
> able to deliver these tools to people who are more comfortable with a
> mouse than the command line. There will always be sequences, networks,
> array type data and the rest.
>
> The problems of scalability are being addressed by the SciPy people.
> Ipython1 is a useful tool for doing stuff on large clusters and these
> tools are only going to become more powerful.
>
> -Peter
>
> _______________________________________________
> biology-in-python mailing list
> biology-in-python at lists.idyll.org
> http://lists.idyll.org/listinfo/biology-in-python
>