[bip] Bioinformatics software design

Bruce Southey bsouthey at gmail.com
Mon Feb 18 07:58:42 PST 2008


Hi,
I can not really comment on the first two but I will any way :-)

I would suggest you look at the numpy and scipy both for API and tests
since these applications may be similar to your case.

1) APIs: The problem with API's is that they should not change over
the long term. I have seen this affect people with the Linux kernel
and with Numerical Python to mainly a limited degree. One of the many
great things about Python is that it aims to be backwards compatible
(okay, Python 3000 will not be) so it is important that API's need to
be rather flexible and immutable once finalized. Sure in a small group
these can change but you don't really want to have to debug some old
code until you realize that an API was the reason that it doesn't
work.

2) Tests: Certain basic tests can be useful while developing if core
pieces will get changed. The main use that I really see for tests are
for those rare, special and weird cases that you don't see everyday as
well as porting to other environments (32 vs 64 bit). Also, there is
some noise on the SciPy list because of the version of NOSE required
to get those tests to work.

3) Developing software. I found that developing for the web (two
bioinformatics examples only for reference Student Interface to the
Biology Workbench and NeuroPred) makes many things easier because you
are limited to what can be provided. It does keep it easy to use and
stay flexible as the User typically just 'points and clicks'. You can
also change the underlying code such as fix bugs without having to
make extensive releases because it will exist at a single place. Also,
anyone with a web connection can use it wherever and whenever they
want (obviously assuming server availability).

One nice nice thing is that you can use html to design the whole
interface somewhat independently of the actual code. This made me
really appreciate Python because it was very easy and quick to rip out
one interface and replace it with another. In another case, I was able
to create a new version under an existing interface. The downside of
the web is the limits imposed by html (viewable screen space is very
important) and getting too complex in terms of the interface. If you
can write the interface with Web 2.0/ AJAX then you can hide some of
the complexity until requested.

Balancing a simple interface and program complexity/flexibility is
very tricky, so first create the interface that you want that covers
everything. You probably will not get any useful feedback without
providing some sort of working prototype. Don't expect the
'Biologists' to help with the interface because unless they understand
the problem and are prepared to learn the software, you are unlikely
to get useful comments.  You may be better off asking informed people,
like this list, who can understand that ideas need to be implemented -
ideally people will can provide you a version of the interface that
they would like.

Perhaps the biggest problem you will face is that you want to maximize
the flexibility but this will create a very complex interface to the
user. Consequently you end up with multiple options as a series of
questions that the User typically only uses the defaults. So you need
to address the standard user who wants a quick answer or doesn't know
the program and the power user who needs to change the default
settings.

With regards to overall development, I would suggest to implement as
much as possible in Python to minimize any dependencies. Look at numpy
and scipy in particular for possible functionality and speed - Pyrex
and company may also help with speed. Then you can determine if you
really need to write extra C/C++ modules or connect to R. Really try
to map out as much as possible because it will help you see 'the wood
for the trees'.

Licensing is also very important especially if you are going to
distribute the code in any form. For example, R is GPL v2 so if you
distribute your complete code it will also have to be GPL v2! So you
had better address this with your group/bosses/lawyers etc. before you
get too far into the process. It is better to start with this in mind
so you can easily work around potential problems as you go. Otherwise,
you may have to rewrite various pieces or start from scratch if you
end up using software with incompatible licenses.

Regards
Bruce

On Feb 17, 2008 4:32 PM, Nathan Harmston
<iwanttobeabadger at googlemail.com> wrote:
> Hi,
>
> For one of my projects I/a group of us will be developing some Systems
> Biology modelling software by integrating Python, C/C++ using Ctypes and
> maybe doing some R integration. And although it hasn't started I was
> thinking about the design of the "API" and the methodology to be used as
> well. So I thought I'd ask a few questions:
>
> 1. What would you consider to be a good "API"?
>           well-documented, intuitive, best guess works, pythonic, do people
> have any examples of what they consider a good API/project?
>
> 2. Testing?
>           although I believe testing is very important I have not really
> gone for a hardcore TDD approach before and am thinking I should do it on
> this project. What frameworks do people suggest are useful, and how would
> you test a function whose output was random/stochastic modelling, since it
> is obviously random?
>
> 3. Would anyone like to suggest any problems they've found in developing
> software for the Bioinformatics/Systems Biology user? I don't like pretty
> interfaces and prefer to keep it simple and powerful and unfortunately
> biologists like pretty things.
>
> Any other comments/ideas are welcome.
>
> Many Thanks
>
> Nathan
> _______________________________________________
> biology-in-python mailing list - bip at lists.idyll.org.
>
> See http://bio.scipy.org/ for our Wiki.
>



More information about the biology-in-python mailing list