<div dir="ltr">Thanks for an interesting question!  I disagree with at least some portions of the thesis.  Comments are inline.<div><br><div class="gmail_quote"><div dir="ltr">On Tue, Oct 20, 2015 at 4:06 PM Randy Syring &lt;<a href="mailto:randy@thesyrings.us">randy@thesyrings.us</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  
  <div text="#000000" bgcolor="#FFFFFF">

    I recently had a chat with my team about fuzz testing.  The thesis

    as proposed is:<br>

    <br>

    <blockquote type="cite">Fuzz tests are useful as tools for a

      developer to run manually which help identify corner cases in the

      code not covered by explicit-branch-based-unit-testing (feel free

      to help me with the term I want here).  </blockquote></div></blockquote><div><br></div><div>I don&#39;t think that statement covers the many ways that fuzz testing (and other forms of test randomization) can help a developer.</div><div><br></div><div>Fuzz testing and other forms of test randomization are useful in far more ways than identifying corner cases in code not covered explicitly by unit tests.  Some examples of cases where fuzz testing and other test randomization techniques can be helpful in:</div><div><ul><li>Increasing the probability of detecting data and logic race conditions<br></li><li>Increasing the probability of detecting interesting external dependencies (timing or logic or data related)</li><li>Increasing the statement or branch level test coverage</li><li>Detecting scenarios which randomization techniques are unlikely to reach so that specialized test setups can be constructed to reach them</li></ul><div>The 2007 paper on Flayer is very interesting reading on a fuzz testing technique that was used to find security problems in OpenSSH and OpenSSL.  <a href="http://valgrind.org/docs/drewry2007.pdf">http://valgrind.org/docs/drewry2007.pdf</a> </div></div><div><br></div><div>The wikipedia article on fuzz testing gives some good examples of strengths and weaknesses of the technique.  <a href="https://en.wikipedia.org/wiki/Fuzz_testing">https://en.wikipedia.org/wiki/Fuzz_testing</a></div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div text="#000000" bgcolor="#FFFFFF"><blockquote type="cite">They should not run by

      default in an automated testing suite (for performance &amp; time

      considerations).  </blockquote></div></blockquote><div><br></div><div>I disagree.  Executing randomization tests in an automated test suite tends to increase the number of variations which are tested, since the automated test suites are likely run very frequently.  That increases the chances that a bug will be detected by the random tests.</div><div><br></div><div>I think it is reasonable to decide that randomization tests executed in a test suite will be time limited so that they only consume some designated portion of the execution time budget.</div><div><br></div><div>If no one is watching the automated tests for success and failure, then automated execution of randomization tests is not helpful, since randomization tests won&#39;t fail the same way on each test run.  However, if no one is watching the tests for success and failure, why run the tests?</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div text="#000000" bgcolor="#FFFFFF"><blockquote type="cite">They should not be used as an excuse for lazy

      developers to not write explicit tests cases that sufficiently

      cover their code.</blockquote></div></blockquote><div><br></div><div>Randomization testing is typically not a replacement for tests with known inputs and expected outputs, though I don&#39;t think it has much to do with &quot;lazy developers&quot;.  It is much more challenging to define the &quot;test oracle&quot; (detector which decides if a particular behavior is a bug or not) for a randomized test than for a program with carefully selected inputs.  </div><div><br></div><div>Unit tests with carefully selected inputs can assert that the outputs exactly match their carefully selected inputs.  Unit tests with random inputs have more difficulty asserting exact output unless they are willing to review the inputs and perform independent calculations to predict the expected output.</div><div><br></div><div>The Flayer article describes some complicated pre-conditions which had to be satisfied before key areas in the OpenSSH code could be fuzz tested.  Those complicate pre-conditions are somewhat akin to unit tests with carefully selected inputs and precisely asserted outputs.</div><div><br></div><div>A specific example I can give from recent experience may help:</div><div><br></div><div>While fixing a timestamp bug in the Jenkins git plugin, I wrote tests with specific input values and specific expected results.  I assured that those tests ran correctly on the platforms I was testing with the git plugin (Ubuntu 14, Debian 8, Debian 7, Debian 6, CentOS 7, CentOS 6, Windows).  I tried to test several interesting boundary cases.</div><div><br></div><div>On a whim, I decided to use randomly selected subsets of the commit history of the git plugin itself to test the timestamp reporting function.  Since I couldn&#39;t easily predict the expected value of the timestamp, I defined that the timestamp should have a value between the first commit to the repository and the current time.  The test ran great on my development platform.  The test seemed to run well on the other platforms.  </div><div><br></div><div>One of the test runs failed on CentOS 6.  The timestamp came back as -1.  I was completely surprised.  I investigated further and found that the git version shipped by default on CentOS 6 would report a bad timestamp for 5 commits of 2200+ in the git plugin history.  Further digging showed that the same 5 commits had the same problem on the git version shipped with Debian 6.  None of the other platforms (or git versions) had that problem.</div><div><br></div><div>Randomization testing discovered that problem.</div><div><br></div><div>Cem Kaner tells a story in the Black Box Software Testing course of cases like that where a value was consistently incorrect, even though there were no obvious predictors that it would be incorrect.</div><div><br></div><div>Thanks!</div><div>Mark Waite</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div text="#000000" bgcolor="#FFFFFF">

    I&#39;m in interested in feedback on the above.  Agree, disagree, and

    most importantly, why.<br>

    <br>

    Thanks.<br>

    <div><br>

      <b>Randy Syring</b><br>

      <small>Husband | Father | Redeemed Sinner</small><br>

      <br>

      <i><small>&quot;For what does it profit a man to gain the whole world<br>

          and forfeit his soul?&quot; (Mark 8:36 ESV)</small></i>

      <br>

      <br>

    </div>

  </div>


_______________________________________________<br>

testing-in-python mailing list<br>

<a href="mailto:testing-in-python@lists.idyll.org" target="_blank">testing-in-python@lists.idyll.org</a><br>

<a href="http://lists.idyll.org/listinfo/testing-in-python" rel="noreferrer" target="_blank">http://lists.idyll.org/listinfo/testing-in-python</a><br>

</blockquote></div></div></div>