[TIP] Testing multi-threaded applications

Tue May 8 06:24:45 PDT 2007

Working on a multi-threaded "daemon" application, I tried as much as  
possible to apply TDD identifying issues. Beyond TDD, the following  
discussion may be related to testing multi-threaded apps in general.  
All the tests pass (but I do like them that much) and the application  
now work, so I would like to share some thoughts / questions.

The producer / consumer example is taken as an example, first because  
it is quite classic and well-known and second because our  
applications follows this pattern, while computations usually take  
several hours.

* Implementing a mono-threaded version of a producer / consumer

A mono-threaded version of our application is quite easy to test as  
only one action can be taken at a time. The specification is  
summarized as: The consumer has to process all the items provided by  
the producer through the shared queue. Every item processed by a  
consumer is added to a log.

So far nothing states if the application is supposed to be mono- or  
multi-threaded. We may feel like not choosing to go multi-thread  
until it is identified as necessary. Moreover, the functional aspect  
of my application in itself is not strongly related to being mono- or  
multi-threaded (this can be seen as an implementation issue) thus we  
can expect the test to be independent of threads.

<pre>
	def test_consumer_processes_all_items_provided_by_producer():
	    queue = Queue()
	    log = Log()
	    items = range(10)
	    Producer(queue).produce(items)
	    Consumer(queue, log).consume()
	    assert queue.empty()
	    for item in items:
	        assert log.contains(item)
</pre>

Running nosetest says that every thing is fine with a first  
"serial" / mono-threaded implementation.

<pre>
     $ nosetests mono.py
     mono.test_consumer_processes_all_items_provided_by_producer ... ok

----------------------------------------------------------------------
     Ran 1 test in 0.007s

     OK
</pre>

* Towards a multi-threaded version of a producer / consumer

When going multi-threaded testing becomes more interesting (as it is  
less trivial). The test fails as it does not give enough time for all  
the threads to complete their task.

	Remark /A functional test does not seems be writable independently
	of the threading policy of the application./

Another test has to be written including waiting enough time for all  
the threads to complete their task before checking the asserts.

<pre>
	def test_threaded_consumer_processes_all_items_provided_by_producer():
	    queue = Queue()
	    log = Log()
	    items = range(10)
	    Producer(queue).produce(items)
	    Consumer(queue, log).consume()
	    wait_for_some_time()
	    assert queue.empty()
	    for item in items:
	        assert log.contains(item)
</pre>

We have a first 'ouch' in our TDD approach: how long should I wait  
for the test to complete without waiting too much (as when tests take  
too long nobody run them). My first thought was to wait enough time  
for all threads to complete their work even if they were not running  
concurrently. However, I cannot estimate this time before writing the  
code, thus I cannot go tests first. In addition I find it fragile to  
define my tests according to a factor that varies from one machine to  
another.

I have finally started writing the code with a test I know was  
"probably" going to fail and "probably" going to pass. Out of a  
hundred runs of this test only 60% went green and 40% went red. How  
can one rely on such a test that is non deterministic?

	Question /How to define a test that can consider a multi-threaded
	function to pass or fail? How long should we wait for the threads
	in order for them to complete their job?/

Another solution could be to use a join() for the threads in the test  
function, so that the assertion are evaluated only when threads are  
terminated. There are other 'ouch' if we vote for this choice.
     First, in my daemon application threads are not supposed to  
terminate when all the items of the queue are processed. So instead  
of waiting their death we have to define a timeout, which has the  
same drawback as the previous approach.
     Second, if threads are not supposed to finish, we may ask them  
to do so. Exposing the threads out of the consumer is a problem as we  
do not want to open sensible parts of our implementation nor for  
client code to be thread dependent. So a method could be added to the  
producers and consumers in order to ask them to stop. This function  
could then be available in both mono- and multi-threaded version (and  
dummy in the first one). But, we have not solve the problem of  
stating how much time we have to wait before asking the producers and  
consumers to stop (considering that if they haven't completed their  
task by now, it's too late).

Finally, I have found a compromise between time to wait and always  
successful tests. However, this compromise is for my configuration  
and my machine only. Its takes ten times more that the mono-threaded  
version to test the multi-threaded one. While we use threads to  
improve performance, the need for waiting in tests make them run  
slower for multi-threaded applications than for mono-threaded ones.

<pre>
     $ nosetests multi.py

multi.test_threaded_consumer_processes_all_items_provided_by_producer .. 
. ok

----------------------------------------------------------------------
     Ran 1 test in 0.713s

     OK
</pre>

* Remarks and questions

For the moment, I consider that my test to be acceptable if waiting  
for a given time I can run my tests a dozen time with no failure. But  
I find this approach (a) time consuming thus bad for tests that won't  
be run if they take too long (b) too empirical to provide trustiness  
in my code (as one of my machine is four years old and the other one  
is few months old).

Have you been facing such testing situation, with threads or with  
another element, were your tests were non-deterministic? Do you have  
any recipes / hints / tips for functional testing or unit testing  
multi-threaded applications that you are willing to share?
     I have not found much for the moment, and most of it was testing  
theory oriented not very pragmatic in my day to day testing activity.

 From another perspective than testing, this 'exercise' confirms the  
following: If you need a multi-threaded application start with  
threads from the beginning as it is completely different from a mono- 
threaded one in its spirit. (Taking a mono-threaded apps and trying  
to patching it to be multi-threaded can cost more than rewriting the  
application, in addition the final result will certainly be not as  
good when patching.)

Cheers,

r.

--
Raphael Marvie, PhD                http://www.lifl.fr/~marvie/
Maître de Conférences / Associate Professor  @  LIFL -- IRCICA
Directeur du Master Informatique Professionnel spécialité IAGL
Head of Master's in Software Engineering     +33 3 28 77 85 83