[bip] agile software development

Titus Brown titus at caltech.edu
Mon Jul 30 23:10:47 PDT 2007


On Tue, Jul 31, 2007 at 03:50:09AM +0200, Andrew Dalke wrote:
-> On Jul 30, 2007, at 5:18 PM, James Taylor wrote:
-> > I think this list is a great idea, and I'm interested in discussing
-> > how to share "infrastructure" in the biology-in-python community.
-> > Particularly, the merits of and problems with monolithic packages
-> > like biopython, and alternative models (e.g. how can we be more agile
-> > while still sharing common interfaces where sensible). But that is a
-> > discussion for another day...
-> 
-> It's another day now. :)

Still the 30th for me, sorry ;)

-> James mentioned agile development.  For details see the Wikipedia
-> page at http://en.wikipedia.org/wiki/Agile_software_development

Actually, he said "agile", which (if you want to be pedantic) means
"Characterized by quickness, lightness, and ease of movement".  The
inference of Agile-with-a-big-A or agile-with-a-little-a is yours. (see

	http://steve-yegge.blogspot.com/2006/09/good-agile-bad-agile_27.html

for a very entertaining but incredibly biased and unfair view of the
difference between A-gile and a-gile.)

In all seriousness, you raise some interesting points below.  I just
think you take the most depressing viewpoint on them all!

-> Can most biology-oriented software development ever be called
-> agile?  I don't think so.
-> 
-> A big thing in agile is:
->    * Customer satisfaction by rapid, continuous delivery of
->        useful software
-> 
-> Who is the customer for most biology software?  In many,
-> the customer is the programmer.  This is great, so long as
-> that remains true.  Which won't be the case for infrastructure
-> projects.

I'm a customer for toolkits -- infrastructure and library packages both.
My biology bosses, users, and collaborators are customers for the
end-user analysis tools I produce.

-> But then what happens if/when the software is released?  There's
-> a new type of customer, who had no influence on the project.
-> I've heard arguments that "I'm designing it for myself, and
-> I'm a biologist."  My response is "but if the people using it
-> were like you they would write the code themselves."

I disagree.  I don't write my own OS, editor, or programming languages
(much -- I do occasionally hack on Python itself).  Yet these were all
written by people like me.  Therefore I don't have to do it.  Yay!
Instead I get to work on marginally more specific solutions to my
scientific problems.

-> Indeed, who is "the customer" of an open source biology program?
-> The user?  (And which kind of user?)  The PI?  The funding
-> agency?  A user with a problem the developers find interesting?
-> Agile makes the assumption that the user is the customer is
-> the person paying the money, but is that often the case for
-> most software in this field?

No, but in academic biology "effort" replaces money, at least to some
extent.  We all get paid the same low salary, but I do listen to the
people who use my software to publish papers, because they've invested
effort.  They are my customers.

-> Another aspect of agile is:
->    * Continuous attention to technical excellence and good design
-> 
-> Most people in this field are trained as a scientist, and
-> rarely as a programmer.  How do you learn what is excellent?
-> How do you learn good design?  How do you justify spending
-> the 2x or 3x more time needed to make a reusable application,
-> compared to a single purpose application?

I agree that it's a problem: scientists are usually lousy software
engineers.  But then, software engineers are usually pretty lousy, too.
I regard it as a continuous education process, myself; I put a certain
amount of time and effort into learning better practices, refactoring,
testing, and otherwise bettering my software skills.

I justify my effort by noting that I write more software, with fewer
bugs, when I pay attention to things like testing and refactoring.  This
results in more papers, more confidence *in* those papers, and more and
better work overall.  I also get reusable software out of it.

The NIH (which is issuing software maintenance grants as we speak)
presumably justifies it by hoping that some good, useful software will
be produced for more general use.

-> And how do you do all of this when your primary job (for
-> grad students and research scientists) is doing science, not
-> software?

How can one do computational _science_ if one doesn't know how to
develop software?

(Answer: badly.)

That's how I justify it ;).

See http://genomebiology.com/2007/8/2/103 for one significant example of
computational science gone awry.  Could strict adherence to agile
principles, or to waterfall design, or whatever buzzword you care to
name, have prevented this?  Maybe, maybe not.  But that points to the
need for more education and more effort, not less.

-> Bioperl worked out well, I think, in large part because it
-> was being used at EBI/Sanger. There were many people working
-> together on the same project in the same geographic location,
-> and with the goal of supporting other people.

Perhaps.  I'm not sure.  I do know that Perl is well suited to script
hacking, but as we move into the era of very large software systems,
it's becoming increasingly obvious that Perl isn't the answer.  I
personally think Python is at least part of the answer, and I'm
investing a fairly large amount of time in it.

Oh, and EBI/Sanger?  One of the main people there told me back in 2004
that he wished he'd used Jython.  So it's not all roses.

cheers,
--titus



More information about the biology-in-python mailing list