[TIP] Testing multi-threaded applications
Raphael Marvie
raphael.marvie at lifl.fr
Tue May 8 06:24:45 PDT 2007
Working on a multi-threaded "daemon" application, I tried as much as
possible to apply TDD identifying issues. Beyond TDD, the following
discussion may be related to testing multi-threaded apps in general.
All the tests pass (but I do like them that much) and the application
now work, so I would like to share some thoughts / questions.
The producer / consumer example is taken as an example, first because
it is quite classic and well-known and second because our
applications follows this pattern, while computations usually take
several hours.
* Implementing a mono-threaded version of a producer / consumer
A mono-threaded version of our application is quite easy to test as
only one action can be taken at a time. The specification is
summarized as: The consumer has to process all the items provided by
the producer through the shared queue. Every item processed by a
consumer is added to a log.
So far nothing states if the application is supposed to be mono- or
multi-threaded. We may feel like not choosing to go multi-thread
until it is identified as necessary. Moreover, the functional aspect
of my application in itself is not strongly related to being mono- or
multi-threaded (this can be seen as an implementation issue) thus we
can expect the test to be independent of threads.
<pre>
def test_consumer_processes_all_items_provided_by_producer():
queue = Queue()
log = Log()
items = range(10)
Producer(queue).produce(items)
Consumer(queue, log).consume()
assert queue.empty()
for item in items:
assert log.contains(item)
</pre>
Running nosetest says that every thing is fine with a first
"serial" / mono-threaded implementation.
<pre>
$ nosetests mono.py
mono.test_consumer_processes_all_items_provided_by_producer ... ok
----------------------------------------------------------------------
Ran 1 test in 0.007s
OK
</pre>
* Towards a multi-threaded version of a producer / consumer
When going multi-threaded testing becomes more interesting (as it is
less trivial). The test fails as it does not give enough time for all
the threads to complete their task.
Remark /A functional test does not seems be writable independently
of the threading policy of the application./
Another test has to be written including waiting enough time for all
the threads to complete their task before checking the asserts.
<pre>
def test_threaded_consumer_processes_all_items_provided_by_producer():
queue = Queue()
log = Log()
items = range(10)
Producer(queue).produce(items)
Consumer(queue, log).consume()
wait_for_some_time()
assert queue.empty()
for item in items:
assert log.contains(item)
</pre>
We have a first 'ouch' in our TDD approach: how long should I wait
for the test to complete without waiting too much (as when tests take
too long nobody run them). My first thought was to wait enough time
for all threads to complete their work even if they were not running
concurrently. However, I cannot estimate this time before writing the
code, thus I cannot go tests first. In addition I find it fragile to
define my tests according to a factor that varies from one machine to
another.
I have finally started writing the code with a test I know was
"probably" going to fail and "probably" going to pass. Out of a
hundred runs of this test only 60% went green and 40% went red. How
can one rely on such a test that is non deterministic?
Question /How to define a test that can consider a multi-threaded
function to pass or fail? How long should we wait for the threads
in order for them to complete their job?/
Another solution could be to use a join() for the threads in the test
function, so that the assertion are evaluated only when threads are
terminated. There are other 'ouch' if we vote for this choice.
First, in my daemon application threads are not supposed to
terminate when all the items of the queue are processed. So instead
of waiting their death we have to define a timeout, which has the
same drawback as the previous approach.
Second, if threads are not supposed to finish, we may ask them
to do so. Exposing the threads out of the consumer is a problem as we
do not want to open sensible parts of our implementation nor for
client code to be thread dependent. So a method could be added to the
producers and consumers in order to ask them to stop. This function
could then be available in both mono- and multi-threaded version (and
dummy in the first one). But, we have not solve the problem of
stating how much time we have to wait before asking the producers and
consumers to stop (considering that if they haven't completed their
task by now, it's too late).
Finally, I have found a compromise between time to wait and always
successful tests. However, this compromise is for my configuration
and my machine only. Its takes ten times more that the mono-threaded
version to test the multi-threaded one. While we use threads to
improve performance, the need for waiting in tests make them run
slower for multi-threaded applications than for mono-threaded ones.
<pre>
$ nosetests multi.py
multi.test_threaded_consumer_processes_all_items_provided_by_producer ..
. ok
----------------------------------------------------------------------
Ran 1 test in 0.713s
OK
</pre>
* Remarks and questions
For the moment, I consider that my test to be acceptable if waiting
for a given time I can run my tests a dozen time with no failure. But
I find this approach (a) time consuming thus bad for tests that won't
be run if they take too long (b) too empirical to provide trustiness
in my code (as one of my machine is four years old and the other one
is few months old).
Have you been facing such testing situation, with threads or with
another element, were your tests were non-deterministic? Do you have
any recipes / hints / tips for functional testing or unit testing
multi-threaded applications that you are willing to share?
I have not found much for the moment, and most of it was testing
theory oriented not very pragmatic in my day to day testing activity.
From another perspective than testing, this 'exercise' confirms the
following: If you need a multi-threaded application start with
threads from the beginning as it is completely different from a mono-
threaded one in its spirit. (Taking a mono-threaded apps and trying
to patching it to be multi-threaded can cost more than rewriting the
application, in addition the final result will certainly be not as
good when patching.)
Cheers,
r.
--
Raphael Marvie, PhD http://www.lifl.fr/~marvie/
Maître de Conférences / Associate Professor @ LIFL -- IRCICA
Directeur du Master Informatique Professionnel spécialité IAGL
Head of Master's in Software Engineering +33 3 28 77 85 83
More information about the testing-in-python
mailing list