[bip] Blog post on bioinformatics and Python

Ryan Raaum ryan.raaum at gmail.com
Wed Sep 17 13:04:12 PDT 2008


> How come with Python "batteries included" is a good thing,
> but with Biopython it's not?

It's neither universally good nor bad, and I think it can be very
good. But it is also really hard to execute seamlessly. It takes a lot
of work to keep the whole thing moving forward, consistent in
implementation, add new features, deprecate old ones, and so on when
all of this needs to be done simultaneously. The mainline Python
distribution and development team struggles with this from time to
time. Overall, I think the Python developers have done very well - but
it is hard.

>
> Is the solution like what Zope's been doing - split itself
> into many smaller packages, and distribute them as eggs?

I think so. I think this approach solves a lot more problems that it
creates. Sure, you have to communicate clearly and coordinate among
groups working on different packages, but that's no different than the
communication and coordination that is necessary now in the all-in-one
BioPython. And the more split up approach really clarifies what
packages/modules are core functionality that needs to be rock-solid,
well documented, and have a stable API, and what parts are more "tip"
packages that can be developed faster, be released more often, and be
more experimental. You can look at the download statistics to see
which packages have only been installed by 3 people in the last year
and which get a lot of use.

>> 2. It is not pure python. I recognize the need for Numeric and C for
>> speed in many circumstances, but having those in the core framework
>> limits where and how it can be used.
>
> Just a history note here.  My memory is hazy, but Jeff Change
> wrote some code which use a C extension if it was available
> or in pure Python if it wasn't.  Some people complained about
> how slow it was, and it turns out it was a misconfiguration
> that caused only the Python code to be installed.
>
> We decided it was better to get complaints about "it doesn't
> work" than deal with unvoiced "Python is so slow" complaints.

Right. This is a choice you made, and it is the right choice for you.
This is a tough problem that has no ideal solution. My preferred
solution has different problems of it own. Nonetheless, I would prefer
that the documentation for that functionality tells me right up front
in big letters that "This will run slowly if the C extension has not
been compiled. To test if you are using the C extension or the pure
python version, import module and run module.is_c()" For that matter,
you could have the default python version loudly note to stderr that
it is going to be slow.

-Ryan



More information about the biology-in-python mailing list