[bip] Revised "Writing Code that Doesn't Suck"

Titus Brown titus at caltech.edu
Tue Sep 25 18:07:32 PDT 2007


attached.

--titus
-------------- next part --------------
==============================
Writing Code That Doesn't Suck
==============================

Here are some prescriptions for writing Python code that other Python
programmers will find more usable and readable than it might otherwise
have been (i.e. code that "doesn't obviously suck").  This advice is
intended for people writing anything more advanced than isolated code
snippets: anything that you might eventually want to release, for
example, or share with other people.

Follow coding conventions
=========================

Follow the suggested Python style guide, PEP 8
(http://www.python.org/dev/peps/pep-0008/).  Also read the docstring
style guide, PEP 257 (http://www.python.org/dev/peps/pep-0257/).

If you follow these style guides, then other people will be able to
read your code more easily.

Also, try not to override built-in Python objects, e.g. ``str = 5`` or
``id = myOwnObject``.

Organize your code
==================

Be systematic in the way you organize your code.  I tend to use the
following conventions (see
http://ivory.idyll.org/articles/advanced-swc/#modules-and-scripts for
my motivation):

 - module filenames must be valid Python identifiers, but scripts
   should use '-' instead of '_' (which renders them non-importable).

 - use shallow package hierarchies for naming, e.g.

	import pkg.fasta_io

   instead of

            import pkg.seq.readers.file_readers.fasta

   The latter one may seem better organized, but no one will ever
   remember the whole package name!  Note that you can always organize
   your actual files in as deep a hierarchy as you want, while keeping
   the public API shallow and easy to use.

 - modules, when executed from the command line, should either run
   tests or do nothing.

 - provide README.txt, lib/, bin/, doc/, and setup.py.  Everyone knows
   what each of those files/directories contains.

Use setup.py and Distutils
==========================

Provide a setup.py file (see docs.python.org, distutils
documentation), even if it's incomplete.  This will let people install
your code in the standard Pythonic way.

Even better, post your code to the Python Package Index and make sure
that it can be installed with easy_install.  This makes it really
trivial to get the latest "official" version of your package.

Provide Automated Tests
=======================

Python has some excellent built-in testing frameworks, including
unittest and doctest.  Use them to provide at least some simple
automated tests.

Even simple unit tests are really important, and they don't take much
time to write.  Put big/complicated functional tests under tests/; put
smaller/simpler unit tests with the code they're testing.

One very useful technique is to write examples in doctest format:
write a text or reStructuredText document that contains discussion
interspersed with executable code, in the form of commands and their
expected output.  Then run them with doctest, which will tell you if
any of your commands produce unexpected output.  This will keep you
honest and make sure that your examples are always up-to-date.

Provide a simple, obvious and/or standard way to run all of your
automated tests.  Test discovery frameworks like nose or py.test may
help with this.  If you want to provide flexibility in which tests to
run, that's great, but make sure that the default command runs a fast
and useful set of tests.

More advanced advice: use code coverage (look up coverage.py or
figleaf) to determine which sections of your code aren't covered by
your current tests; use continuous integration (buildbot) to run your
tests nightly on several different platforms.

Provide API Documentation
=========================

Put in docstrings; they're a fundamental part of Python.  Adhere to
the docstring "coding standard", PEP 257
(http://www.python.org/dev/peps/pep-0257/).

There are several tools for extracting docstrings and module/package
information and building Web pages out of them; I personally like
epydoc, but there are several other choices out there.  Use one, and
update your HTML docs nightly or weekly as your code changes.

Write some examples.  Was there a particular problem that motivated
you to write your code?  If so, describe how you solved it, in simple
language, with really simple code.

If you have users and you change the API, be sure to tell them, e.g. use
a ChangeLog.

Making Releases
===============

Provide regular zip and/or .tar.gz files of your entire distribution,
with docs and generated files.  Make them available via HTTP and/or
FTP.

Name and number your releases systematically, preferably either by
date (if you haven't formally released the package yet!) or number.
Use something that will stay in approximate lexical order, e.g. ::

   myproject-0.1
   myproject-0.2
   myproject-0.3
   myproject-1.0

Don't go over the number 10, e.g. ``project-0.10`` looks too much like
``project-0.1`` for comfort, both by eye and to the computer.

Don't use ``-latest`` except for snapshots, e.g. if you use
``project-latest.tar.gz`` it should be an automatically generated
snapshot.

Don't use ``-current`` unless it's a symbolic link to the latest
release.  (Preferably, keep old releases in a separate location, like
``OLD/``, and only have one obvious release -- the latest one -- on
the main page or in the main download directory.)

**Always** have a file named ``project-x.y.tar.gz`` unpack into
``project-x.y/``.  Never have it unpack into the current directory,
or into another directory name where it may conflict with an
older/different version of your software.

Release fairly regularly.  It's (much!) better to have a release with
some known bugs than it is to never release anything at all.

Use version control and provide public access
=============================================

Use CVS or Subversion to store and publish your code; everyone has
clients for them and everyone knows how to use them.  If you use darcs
or bazaar-ng or git, that's great -- but provide a nightly snapshot of
your latest source code, so that I don't have to install something in
order to get your latest version.

Copyright and license your code
===============================

Copyright holders are the people who own the code, and they can
license it for use, modification, and redistribution by others.  Make
this information explicit if you want others to use your code!

Provide an explicit copyright notice, so people know who the author is
and when the code was written.  (Make sure you know who actually owns
your code: it may be your employer, not you!)

Provide an explicit license -- we suggest either the BSD, the GPL, or
the LGPL.  (Do your best to avoid writing a new license; it will
confuse people.)  The BSD releases your software for use,
modification, and redistribution by anyone, while the LGPL and GPL
place certain requirements on redistribution.

Conclusions
===========

People often start out programming with their own conventions, or
their teacher's conventions.  In Python, there is really One Right
Way to Do It -- see PEP 8.

Not interested in conforming?  Well, I know you're brilliant and
idiosyncratic and your personal naming conventions, or spacing choice,
or homegrown test framework, are a important signs of your
individuality and creativity -- but unfortunately they're likely to
get in my way when I try to use your code.  What I'm really interested
in is the creativity of your approach and algorithms.  After all, the
point of sharing code is to share your code and your solutions, not
your idiosyncratic approach to programming!  So, if you follow the
above conventions, you have a better chance of providing code that
other people will want to use.

If you need help with getting any of the above things to work, please
just ask.  The Python community is always quite happy to help people
work through issues related to making your code look better, behave
better, and play more nicely with other code.

Contributors
============

C. Titus Brown wrote the first draft.  Brandon King submitted detailed
comments that were incorporated into the second draft.  We welcome
comments or diffs!

(sep25/2007 draft.)


More information about the biology-in-python mailing list