[bip] agile software development

Andrew Dalke dalke at dalkescientific.com
Tue Jul 31 05:46:22 PDT 2007


On Jul 31, 2007, at 8:10 AM, Titus Brown wrote:
> Still the 30th for me, sorry ;)

The miracle of time zones.  It's only 10 hours until August here.

> -> James mentioned agile development.  For details see the Wikipedia
> -> page at http://en.wikipedia.org/wiki/Agile_software_development
>
> Actually, he said "agile", which (if you want to be pedantic) means
> "Characterized by quickness, lightness, and ease of movement".  The
> inference of Agile-with-a-big-A or agile-with-a-little-a is yours.  
> (see
>
> 	http://steve-yegge.blogspot.com/2006/09/good-agile-bad-agile_27.html
>
> for a very entertaining but incredibly biased and unfair view of the
> difference between A-gile and a-gile.)

Indeed.  A general complaint of mine about XP and Agile and such
is that most of the best practices are identical, or a subset
to, those in Steve McConnell's "Rapid Development" from
the '90s, and as various places point out, including
   http://www2.umassd.edu/SWPI/xp/articles/r6047.pdf
which was linked to from
   http://en.wikipedia.org/wiki/Agile_software_development
the design methodology of "iterative, evolutionary and
incremental software development" can be traced back some
decades earlier.

Stevey's rant, which I quite enjoyed, concludes:

     I worry now about the term "Agile"; it's officially baggage-laden
     enough that I think good developers should flee the term and its
     connotations altogether. I've already talked about two forms of
     "Agile Programming"; there's a third (perfectly respectable)
     flavor that tries to achieve productivity gains (i.e. "Agility")
     through technology. Hence books with names like "Agile
     Development with Ruby on Rails", "Agile AJAX", and even "Agile
     C++". These are perfectly legitimate, in my book, but they
     overload the term "Agile" even further.

I took "agile" to mean specifically "Agile development", and
not the other possible meanings.  Note that there's no short,
pithy phrase for "iterative, evolutionary and incremental
software development".  Other than "sane."


> In all seriousness, you raise some interesting points below.  I just
> think you take the most depressing viewpoint on them all!

In the last couple of weeks I:

   - read a journal paper where the authors made a technical assertion
that was demonstrably wrong, and had to hand-wave a potential
problem because they used such a slow implementation that they
couldn't do proper sampling to characterize the error rates.  The
fast implementation was shorter, and its lack shows that the authors
and the reviewers didn't know how the technology is implemented.

   - reviewed InChI, an open source (but closed development) package
in chemical informatics.  I found several denial-of-service
attacks and one trivially exploitable buffer overflow attack in
the InChI string parser.  The authors feel like that interface is
relatively unimportant, because they are concerned with the MDL
molfile -> InChI string generation.  Although there are public
web sites which support InChI input, and hence exploitable.  And
after all, no one else has complained.  InChI software cycles
are long, so it will be a while until my fixes are put in.

   - reviewed the OpenBabel code and found a buffer overflow.
That one was already patched in the trunk.

   - reviewed the OpenBabel and CDK SMILES parsers and found
the usual mess of hand-written parser code.  I've been advocating
machine generated parsers for a while now for this task.  But
few chemists (computational or otherwise) know how to use those
sorts of tools.

Oh, and yes, I'm a parser freak ;)

So I'm feeling a bit depressed about software in this field
because there are so few code bases or APIs I look at where
I come out thinking "they did a good job."  This is different
than "there are some capabilities I can use."

I've been trying to figure out what I can do to change things.


> I'm a customer for toolkits -- infrastructure and library packages  
> both.
> My biology bosses, users, and collaborators are customers for the
> end-user analysis tools I produce.

I think you are an exception.  I look at many of the software
programs written for BOSC (the Bioinformatics Open Source Conference)
or in the journal Bioinformatics.  In most cases the developer is
the primary customer.  And it's really hard for most to get over the
idea that "because I'm a biologist, I can write software for biologists
because I know what we/they want."

> -> But then what happens if/when the software is released?  There's
> -> a new type of customer, who had no influence on the project.
> -> I've heard arguments that "I'm designing it for myself, and
> -> I'm a biologist."  My response is "but if the people using it
> -> were like you they would write the code themselves."
>
> I disagree.  I don't write my own OS, editor, or programming languages
> (much -- I do occasionally hack on Python itself).  Yet these were all
> written by people like me.  Therefore I don't have to do it.  Yay!
> Instead I get to work on marginally more specific solutions to my
> scientific problems.

I didn't follow your response.  Or I think there's a population problem.

I think most biology software, even distributed software, is meant
to solve the problems of the author, and not that of the people who
can be considered the customers.  As such, and because it's something
like 2* or 3* harder to make generally reusable software, the needs
of the customers are secondary.

Regarding OS, editors and programming languages, you have such
a huge diversity of projects to choose from - several orders of
magnitude more than any bioinformatics project - that you end
up using ones which do have a strong customer focus.  People aren't
forced to the use the ones that are crap.

Quick - how many independent implementations of BLAST are there?
And how many SQL database implementations?  Which is harder to write?

There's my pessimism coming out again.  :)

> -> Indeed, who is "the customer" of an open source biology program?
> -> The user?  (And which kind of user?)  The PI?  The funding
> -> agency?  A user with a problem the developers find interesting?
> -> Agile makes the assumption that the user is the customer is
> -> the person paying the money, but is that often the case for
> -> most software in this field?
>
> No, but in academic biology "effort" replaces money, at least to some
> extent.  We all get paid the same low salary, but I do listen to the
> people who use my software to publish papers, because they've invested
> effort.  They are my customers.

I also think you're an unusual example.

If you didn't listen to your customers, what's the consequence
to you?

Is it better for your career to spend less doing support and
maintenance, and instead develop new software and publish
new results?  Note that the things you talked about ("learning
better practices, refactoring, testing, and otherwise bettering
my software skills") are developer focused, not customer focused.


Why don't most people use the NCBI toolkit?  Using the analysis
framework of "who is the customer?", my belief is that the
toolkit wasn't designed for others.  It was designed for in-house
people to develop tools for others.  The primary customers use
those end-tools, and not the framework.  There's less incentive for
the NCBI developers to rework the framework to make it easier for
external developers, and because of the high barrier to entry,
those external people decide to try some other solution.



> -> Most people in this field are trained as a scientist, and
> -> rarely as a programmer.  How do you learn what is excellent?
> -> How do you learn good design?

> I agree that it's a problem: scientists are usually lousy software
> engineers.  But then, software engineers are usually pretty lousy,  
> too.

Sturgeon's Law: 90% of everything is cr^Wlousy.

> I regard it as a continuous education process, myself; I put a certain
> amount of time and effort into learning better practices, refactoring,
> testing, and otherwise bettering my software skills.

Again, I think you're unusual in this regard.

My crazy thought is to have a "code review" track at some conference.
Bring in your code and review it with others, and review others' code.
Make sure there are some master programmers there who can provide
deep review.

This would be painful for some, if not done right.  Some people
get very disheartened after having all the flaws in their hard
work pointed out.  And some reviewers get a kick out of showing
their superiority ("Why the *@#$ did you do *that*?").  There
needs to be an encouraging, nurturing attitude from the start.

Quoting a relevant part from the Stevey rant:
    There are other incentives. One is that Google a peer-review
    oriented culture, and earning the respect of your peers means
    a lot there. More than it does at other places, I think. This
    is in part because it's just the way the culture works; it's
    something that was put in place early on and has managed to
    become habitual. It's also true because your peers are so damn
    smart that earning their respect is a huge deal. And it's true
    because your actual performance review is almost entirely based
    on your peer reviews, so it has an indirect financial impact on you.


How do we instill a rapid/iterative/agile ;) "peer-review oriented
culture" in biology software?

Which projects have that?  I think bioperl's development
came closest.

Science has a peer review culture, but it's not in the software.
And the feedback loop through journals is measured in months or years.
It's even worse for independent fools like myself who don't have
ready access to a scientific library.


> -> And how do you do all of this when your primary job (for
> -> grad students and research scientists) is doing science, not
> -> software?
>
> How can one do computational _science_ if one doesn't know how to
> develop software?
>
> (Answer: badly.)
>
> That's how I justify it ;).

The counter is "it's good enough."  "But it works" (The EBI phrase
is 'been used in anger').  "We're scientists here, not programmers."
"If only I had the time to learn all of these things, but I've got a
deadline / funding limitations / ..."

> See http://genomebiology.com/2007/8/2/103 for one significant  
> example of
> computational science gone awry.

All I could read was the abstract.  I remember reading some of the
xtal papers from the early 1980s, back before there was strong
pressure on the xtalographers to release their structures quickly.
The papers were quite thick.  I think because they wanted to
include everything, and had the time to be "more thoughtful."  At
the expense of the entire field waiting an extra year or three.

Did anyone blog about the article?  Not that I can find.  Only 20 more
months until it's public!  Guess I'll get myself over to the library
when I go into town tomorrow.


> Could strict adherence to agile principles, or to waterfall design,
> or whatever buzzword you care to name, have prevented this?  Maybe,
> maybe not.  But that points to the need for more education and more
> effort, not less.

The article abstract mentions nothing about software or the
research method.

I think the reference to "waterfall design", when brought up
in the various pro-agile/xp debates, is a straw man argument.
Very few people use that approach.

I never argue for strict adherence in anything, except recursively.  ;)

Have you come across the term "quality without a name", usually
written "QWAN"?  It has unfortunate Zen connotations.  I think
of it in part as a way to say "but I'm not naming any buzzword."

No methodology, design, or practice is immune to failure.
Ask me to write a next generation document publishing suite,
and I can tell you know it will fail no matter what methodology
I use.  Ask anyone to write a Star Trek-style AI, and again,
it will fail.

Suppose there was a 1% failure rate on realistic projects.
There's how many research projects in the world?  And how many
of those failures would be interesting enough to report?

> -> Bioperl worked out well, I think, in large part because it
> -> was being used at EBI/Sanger. There were many people working
> -> together on the same project in the same geographic location,
> -> and with the goal of supporting other people.
>
> Perhaps.  I'm not sure.  I do know that Perl is well suited to script
> hacking, but as we move into the era of very large software systems,
> it's becoming increasingly obvious that Perl isn't the answer.  I
> personally think Python is at least part of the answer, and I'm
> investing a fairly large amount of time in it.
>
> Oh, and EBI/Sanger?  One of the main people there told me back in 2004
> that he wished he'd used Jython.  So it's not all roses.

Well, c'mon, this *is* a Python-oriented list so I of course
believe you.

The question is, why did Perl get to the prominence it has in
bioinformatics?  Despite being a lesser language ;)

Perl and the web came at about the same time.  When I looked at
different ways of doing CGI programming in .. '94? .. the
NCSA http server included examples of CGI programs in various
shell languages, C, perl, and a few others language.  I reviewed
at all of them, learned a bit of perl, and decided that was the
one to go with.

A lot of other people made the same decision.

Bioinformatics does a lot of data sharing, and was involved
with the web early on.  I like to point out that Steven Brenner
wrote cgi-lib.pl (a perl4 library for web development) and
Lincoln Stein did CGI.pm (a perl5 library for the same).  Both
bioinformatics people.

In addition, bioinformatics in the 1980s, before it had that
name, was strongly dominated by Unix developers.  Many of
the early database systems were written for Sybase on SunOS.
There was a history of using unix tools, and perl fits in
very nicely with the unix mindset.

Bioinformatics emerged out of the sequence analysis world,
where most things were oriented around strings, and data
about strings.  Perl was good enough for this - very good
at munging strings, and hashes are enough for most of the
data.

Compare this chemical informatics, where I spend most of
my time.  Lots of graph structures, and nodes and edges
in the graph (atoms and bonds) have properties.  This is
much harder to do in Perl than in Python.  And you can see
there's more Python code in chemical informatics than Perl.


I do want to point out I wrote my comments about EBI/Sanger
in the past tense.  What was appropriate for the 90's isn't
necessarily true now.

My point is that systems which are flexible and widely usable,
systems which can be good infrastructure, rarely come from
small, insular developers.  The programmers need to use
"iterative, evolutionary and incremental software development".
The software has to be used in 2 or 3 different contexts.
Only then is there a good chance of being generally successful.

I point out bioperl as an example of a project where there
were enough people, with enough diverse interests, for that
to happen.

In chemical informatics I think the OpenEye software is pretty
good, and in large part because it's on the 3rd or 4th rewrite
(depending on how you count), and each is based on lots of
feedback from paying customers and from dog-fooding.

In modeling, CHARMm is an interesting example of a widely
used code base.  My experience with that comes from the '90s
when it seemed like 1/2 of the modeling world did research in
the Karplus's lab at one time or other.

What other projects are like those?

As long as I'm blabbering on, I think Taverna is a counter-
example.  I think it's an example of "throw enough smart
people at a project and they can do it".  I've been told
that getting up to speed, to write a new Taverna service,
will take a newbie several days to do.  While that might
have changed in the last 6 months, it doesn't make me
feel all warm and fuzzy inside.  I figured it should take
a few to 30 minutes.


				Andrew
				dalke at dalkescientific.com





More information about the biology-in-python mailing list