[TIP] Fwd: Testing multi-threaded applications

Tue May 8 07:38:56 PDT 2007

Whoops--forgot to copy the list.

---------- Forwarded message ----------
From: Will Guaraldi <will.guaraldi at gmail.com>
Date: May 8, 2007 10:38 AM
Subject: Re: [TIP] Testing multi-threaded applications
To: Raphael Marvie <raphael.marvie at lifl.fr>

I'm not really sure I understand what it is you're actually trying to
test.  Are you testing to make sure there's no possibility for dead
lock?  Are you testing to make sure threads can't starve?  Are you
testing that the threads are interacting as you've asked them to?

I think that non-deterministic tests are way bad because you can't
rely on the result of the test.  If you can't rely on the result of
the test, then you can't accurately compare it with your expectations
of the result and that's not particularly useful.

You mentioned doing a thread join on the threads to wait until they've
completed.  What's the data flow of your application?  Depending on
the data flow graph, it might be possible to do a thread join on one
or two specific threads rather than all the threads.

Is there a way to have your threads maintain their status on a
scoreboard of some kind?  For example, when they get a job to perform,
they update status information somewhere accessible.  Then when the
thread is done with the job, it updates the status information
again--perhaps this time marking that it's completed another job.  If
you can keep some status information somewhere, then you can have your
test check to see if all the threads have completed what you've asked
them to complete and end the test at that point in time.

A test that takes too long isn't necessarily bad--you don't _have_ to
run all of your tests all the time.  Perhaps you should take the
longer running tests and group them into a "systems test suite" which
gets run less often than unit tests, functional tests or regression
tests.  You could throw the systems tests into a build-bot and have it
automatically run them on what's in version control every night.

I don't have a whole lot of experience with TDD and multi-threaded
applications, so that's all I'm going to think of without building one
now and experimenting.  I haven't read much regarding testing
multi-threaded applications.  In general having multiple threads makes
application development much more complex to design and debug--I
imagine this holds true for testing as well.

/will

On 5/8/07, Raphael Marvie <raphael.marvie at lifl.fr> wrote:
>
> Working on a multi-threaded "daemon" application, I tried as much as
> possible to apply TDD identifying issues. Beyond TDD, the following
> discussion may be related to testing multi-threaded apps in general.
> All the tests pass (but I do like them that much) and the application
> now work, so I would like to share some thoughts / questions.
>
> The producer / consumer example is taken as an example, first because
> it is quite classic and well-known and second because our
> applications follows this pattern, while computations usually take
> several hours.
>
> * Implementing a mono-threaded version of a producer / consumer
>
> A mono-threaded version of our application is quite easy to test as
> only one action can be taken at a time. The specification is
> summarized as: The consumer has to process all the items provided by
> the producer through the shared queue. Every item processed by a
> consumer is added to a log.
>
> So far nothing states if the application is supposed to be mono- or
> multi-threaded. We may feel like not choosing to go multi-thread
> until it is identified as necessary. Moreover, the functional aspect
> of my application in itself is not strongly related to being mono- or
> multi-threaded (this can be seen as an implementation issue) thus we
> can expect the test to be independent of threads.
>
> <pre>
>         def test_consumer_processes_all_items_provided_by_producer():
>             queue = Queue()
>             log = Log()
>             items = range(10)
>             Producer(queue).produce(items)
>             Consumer(queue, log).consume()
>             assert queue.empty()
>             for item in items:
>                 assert log.contains(item)
> </pre>
>
> Running nosetest says that every thing is fine with a first
> "serial" / mono-threaded implementation.
>
> <pre>
>      $ nosetests mono.py
>      mono.test_consumer_processes_all_items_provided_by_producer ... ok
>
>
> ----------------------------------------------------------------------
>      Ran 1 test in 0.007s
>
>      OK
> </pre>
>
>
> * Towards a multi-threaded version of a producer / consumer
>
> When going multi-threaded testing becomes more interesting (as it is
> less trivial). The test fails as it does not give enough time for all
> the threads to complete their task.
>
>         Remark /A functional test does not seems be writable independently
>         of the threading policy of the application./
>
> Another test has to be written including waiting enough time for all
> the threads to complete their task before checking the asserts.
>
> <pre>
>         def test_threaded_consumer_processes_all_items_provided_by_producer():
>             queue = Queue()
>             log = Log()
>             items = range(10)
>             Producer(queue).produce(items)
>             Consumer(queue, log).consume()
>             wait_for_some_time()
>             assert queue.empty()
>             for item in items:
>                 assert log.contains(item)
> </pre>
>
> We have a first 'ouch' in our TDD approach: how long should I wait
> for the test to complete without waiting too much (as when tests take
> too long nobody run them). My first thought was to wait enough time
> for all threads to complete their work even if they were not running
> concurrently. However, I cannot estimate this time before writing the
> code, thus I cannot go tests first. In addition I find it fragile to
> define my tests according to a factor that varies from one machine to
> another.
>
>
> I have finally started writing the code with a test I know was
> "probably" going to fail and "probably" going to pass. Out of a
> hundred runs of this test only 60% went green and 40% went red. How
> can one rely on such a test that is non deterministic?
>
>         Question /How to define a test that can consider a multi-threaded
>         function to pass or fail? How long should we wait for the threads
>         in order for them to complete their job?/
>
>
> Another solution could be to use a join() for the threads in the test
> function, so that the assertion are evaluated only when threads are
> terminated. There are other 'ouch' if we vote for this choice.
>      First, in my daemon application threads are not supposed to
> terminate when all the items of the queue are processed. So instead
> of waiting their death we have to define a timeout, which has the
> same drawback as the previous approach.
>      Second, if threads are not supposed to finish, we may ask them
> to do so. Exposing the threads out of the consumer is a problem as we
> do not want to open sensible parts of our implementation nor for
> client code to be thread dependent. So a method could be added to the
> producers and consumers in order to ask them to stop. This function
> could then be available in both mono- and multi-threaded version (and
> dummy in the first one). But, we have not solve the problem of
> stating how much time we have to wait before asking the producers and
> consumers to stop (considering that if they haven't completed their
> task by now, it's too late).
>
> Finally, I have found a compromise between time to wait and always
> successful tests. However, this compromise is for my configuration
> and my machine only. Its takes ten times more that the mono-threaded
> version to test the multi-threaded one. While we use threads to
> improve performance, the need for waiting in tests make them run
> slower for multi-threaded applications than for mono-threaded ones.
>
> <pre>
>      $ nosetests multi.py
>
> multi.test_threaded_consumer_processes_all_items_provided_by_producer ..
> . ok
>
>
> ----------------------------------------------------------------------
>      Ran 1 test in 0.713s
>
>      OK
> </pre>
>
>
> * Remarks and questions
>
> For the moment, I consider that my test to be acceptable if waiting
> for a given time I can run my tests a dozen time with no failure. But
> I find this approach (a) time consuming thus bad for tests that won't
> be run if they take too long (b) too empirical to provide trustiness
> in my code (as one of my machine is four years old and the other one
> is few months old).
>
> Have you been facing such testing situation, with threads or with
> another element, were your tests were non-deterministic? Do you have
> any recipes / hints / tips for functional testing or unit testing
> multi-threaded applications that you are willing to share?
>      I have not found much for the moment, and most of it was testing
> theory oriented not very pragmatic in my day to day testing activity.
>
>  From another perspective than testing, this 'exercise' confirms the
> following: If you need a multi-threaded application start with
> threads from the beginning as it is completely different from a mono-
> threaded one in its spirit. (Taking a mono-threaded apps and trying
> to patching it to be multi-threaded can cost more than rewriting the
> application, in addition the final result will certainly be not as
> good when patching.)
>
> Cheers,
>
> r.
>
> --
> Raphael Marvie, PhD                http://www.lifl.fr/~marvie/
> Maître de Conférences / Associate Professor  @  LIFL -- IRCICA
> Directeur du Master Informatique Professionnel spécialité IAGL
> Head of Master's in Software Engineering     +33 3 28 77 85 83