[TIP] Thoughts on Fuzz Testing

Mark Waite mark.earl.waite at gmail.com
Tue Oct 20 16:43:52 PDT 2015

Thanks for an interesting question!  I disagree with at least some portions
of the thesis.  Comments are inline.

On Tue, Oct 20, 2015 at 4:06 PM Randy Syring <randy at thesyrings.us> wrote:

> I recently had a chat with my team about fuzz testing.  The thesis as
> proposed is:
> Fuzz tests are useful as tools for a developer to run manually which help
> identify corner cases in the code not covered by
> explicit-branch-based-unit-testing (feel free to help me with the term I
> want here).
I don't think that statement covers the many ways that fuzz testing (and
other forms of test randomization) can help a developer.

Fuzz testing and other forms of test randomization are useful in far more
ways than identifying corner cases in code not covered explicitly by unit
tests.  Some examples of cases where fuzz testing and other test
randomization techniques can be helpful in:

   - Increasing the probability of detecting data and logic race conditions
   - Increasing the probability of detecting interesting external
   dependencies (timing or logic or data related)
   - Increasing the statement or branch level test coverage
   - Detecting scenarios which randomization techniques are unlikely to
   reach so that specialized test setups can be constructed to reach them

The 2007 paper on Flayer is very interesting reading on a fuzz testing
technique that was used to find security problems in OpenSSH and OpenSSL.

The wikipedia article on fuzz testing gives some good examples of strengths
and weaknesses of the technique.  https://en.wikipedia.org/wiki/Fuzz_testing

They should not run by default in an automated testing suite (for
> performance & time considerations).
I disagree.  Executing randomization tests in an automated test suite tends
to increase the number of variations which are tested, since the automated
test suites are likely run very frequently.  That increases the chances
that a bug will be detected by the random tests.

I think it is reasonable to decide that randomization tests executed in a
test suite will be time limited so that they only consume some designated
portion of the execution time budget.

If no one is watching the automated tests for success and failure, then
automated execution of randomization tests is not helpful, since
randomization tests won't fail the same way on each test run.  However, if
no one is watching the tests for success and failure, why run the tests?

They should not be used as an excuse for lazy developers to not write
> explicit tests cases that sufficiently cover their code.
Randomization testing is typically not a replacement for tests with known
inputs and expected outputs, though I don't think it has much to do with
"lazy developers".  It is much more challenging to define the "test oracle"
(detector which decides if a particular behavior is a bug or not) for a
randomized test than for a program with carefully selected inputs.

Unit tests with carefully selected inputs can assert that the outputs
exactly match their carefully selected inputs.  Unit tests with random
inputs have more difficulty asserting exact output unless they are willing
to review the inputs and perform independent calculations to predict the
expected output.

The Flayer article describes some complicated pre-conditions which had to
be satisfied before key areas in the OpenSSH code could be fuzz tested.
Those complicate pre-conditions are somewhat akin to unit tests with
carefully selected inputs and precisely asserted outputs.

A specific example I can give from recent experience may help:

While fixing a timestamp bug in the Jenkins git plugin, I wrote tests with
specific input values and specific expected results.  I assured that those
tests ran correctly on the platforms I was testing with the git plugin
(Ubuntu 14, Debian 8, Debian 7, Debian 6, CentOS 7, CentOS 6, Windows).  I
tried to test several interesting boundary cases.

On a whim, I decided to use randomly selected subsets of the commit history
of the git plugin itself to test the timestamp reporting function.  Since I
couldn't easily predict the expected value of the timestamp, I defined that
the timestamp should have a value between the first commit to the
repository and the current time.  The test ran great on my development
platform.  The test seemed to run well on the other platforms.

One of the test runs failed on CentOS 6.  The timestamp came back as -1.  I
was completely surprised.  I investigated further and found that the git
version shipped by default on CentOS 6 would report a bad timestamp for 5
commits of 2200+ in the git plugin history.  Further digging showed that
the same 5 commits had the same problem on the git version shipped with
Debian 6.  None of the other platforms (or git versions) had that problem.

Randomization testing discovered that problem.

Cem Kaner tells a story in the Black Box Software Testing course of cases
like that where a value was consistently incorrect, even though there were
no obvious predictors that it would be incorrect.

Mark Waite

I'm in interested in feedback on the above.  Agree, disagree, and most
> importantly, why.
> Thanks.
> *Randy Syring*
> Husband | Father | Redeemed Sinner
> *"For what does it profit a man to gain the whole world and forfeit his
> soul?" (Mark 8:36 ESV)*
> _______________________________________________
> testing-in-python mailing list
> testing-in-python at lists.idyll.org
> http://lists.idyll.org/listinfo/testing-in-python
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.idyll.org/pipermail/testing-in-python/attachments/20151020/65cc5b39/attachment.htm>

More information about the testing-in-python mailing list