[bip] Future of bioinformatics in python..?

Andrew Dalke dalke at dalkescientific.com
Fri Aug 3 08:50:17 PDT 2007


On Aug 3, 2007, at 5:13 PM, James Taylor wrote:
> I think a big monolithic project like biopython is inherently
> difficult to maintain.

BTW, there's about 100,000 lines of Python code in Biopython
under Bio/ .  The two biggest files are from data files
     2624 ./SubsMat/MatrixInfo.py
    14499 ./Restriction/Restriction_Dictionary.py

My experience has been that 100kLOC is about the
size of "big" packages in biology and chemistry.  That's
about what a small group can develop without having
to worry much about design.

(Just like human groups work okay up to about 50-200 people
before needing to worry about internal organization.)

> Rather than assembling a single "project" I think it is better to
> have a community of small projects. Even my own package ("bx-python")
> is too big I think. Small projects are much easier to maintain, and
> easier for users to adopt.

See, I don't understand why Biopython is considered a
monolithic project.  It really is a collection of different
projects organized under the same namespace.  Many of the parts
can be used elsewhere with, at most, changing the import statements.

This is the "batteries included" model that Python itself has.

The most common dependency is on the Seq module and the
FASTA reader.  There are submodules which don't even import Bio.


I think the problem with having a lot of small modules is
the lack of interoperability across those modules.  Everyone
ends up using different ways to pass around sequence data.
Everyone ends up rewriting parsers.

It's like the bad old days of C++, when every library had
its own string class, so it was a headache doing the
impedance matching.

> And fortunately for us, Python now has the perfect infrastructure for
> allowing projects to depend on lots of other small packages.

Yes, that's a useful ability.

Would Biopython be more successful if it was distributed
as a set of modules, dependent only on some small shared
core?


				Andrew
				dalke at dalkescientific.com





More information about the biology-in-python mailing list