[TIP] Implementation details of x-outcomes WAS: unitteset outcomes (pass/fail/xfail/error), humans and automation

Tue Dec 15 06:12:39 PST 2009

On Mon, Dec 14, 2009 at 5:05 PM, Robert Collins
<robertc at robertcollins.net> wrote:
> On Mon, 2009-12-14 at 09:55 -0500, Olemis Lang wrote:
>> On Mon, Dec 14, 2009 at 3:41 AM, Robert Collins
>> <robertc at robertcollins.net> wrote:
>> >
>>
>> Like I mentioned before there are three main outcomes out there :
>>
>>   - pass : the SUT satisfies the test condition (according to the
>>     judgments of the author of the test ;o)
>>   - failure : the SUT does not satisfy the test condition
>>   - error : the test code contains bugs
>>
>> Everybody agree ?
>
> No :). POSIX specifies many more outcomes;

I just want to be in synch here . Are you talking about a standard in
POSIX family [1]_ that's specific for testing and mentions a set of
outcomes ?

If you're talking about the general POSIX family of stds then my
opinion is that it's scope is more complex than the simple fact that
test cases either pass, fail or contain bugs .

Otherwise, could you provide a reference to consult the specific
standard and understand what u're saying ?

> users want more outcomes,

I know and I agree . But in the end you'll have to decide whether it's
a failure, an error or success . So they are the three *MAIN* outcomes
. All others are more specific instances of the former with
extra-information and (specially) custom semantics

> and
> in fact the difference between failure and error is meaningless for many
> people, while being important for others.
>

I do think it's important to consider ERROR as a *main* outcome . So
the right choice IMO should be to consider it and let others decide
whether it's useful for them or not . But an error in the test code is
definitely something different to an error in the SUT, and the
corrective actions and implications might be completely different as
well.

>> The goal then IMO should be to refine those outcomes (isn't it ?)
>>
>> If so :
>>   - pass could be represented by warnings and there is a whole std
>> hierarchy for that .
>>   - failure could be represented by exceptions and there is a whole
>> std hierarchy for that .
>>   - something similar for errors (AFAICR there's an exception type for that)
>>
>> so my question is :
>>
>> Q:
>>   - Why not to leave the overall API almost intact and just add
>>     new (exception | warning) hierarchy for testing purposes with
>>     further details included inside of exception objects ?
>
> I prefer not to discuss possible implementations till I've got a decent
> sense of the goals and constraints :)
>

IMO , it's better to follow an iterative approach and refine the
analysis as new (fresh) ideas emerge ;o)

Well, suggestion :

  - Why not to create a wiki page (like this one [2]_ ) and use a
table to compare
    the different solution candidates (e.g. columns) considering the
requirements
    (e.g. rows) mentioned in the other thread?

The fact is that mail archives and or threads are not IMO an
appropriate way to analyze the whole picture in order to make a
decision. It'd be nice to summarize the discussion in such a place and
then use it as a reference to decide.

So far I remember two or three candidates :

  - Create a hierarchy of (exceptions & warnings) and  «extend» the current API
    in order to support the requirements. Custom test status would be
sub-classes
    of those top-level exception types
  - The approach mentioned in the other thread and implemented by Pandokia
    (AFAICR) that consists on binding custom attributes to test results (CMIIW).
  - Probably (couldn't read about it ;o) something similar to nose's
ErrorClass plugins [3]_
  - you can add your own ... ;o)

As new reqs come and go they could be prioritized , and pros & cons
will be discussed in the list and documented in the wiki page .

What d'u think ?

>> > I want to:
>> >  - be able to automate with them: if I add an outcome 'MissingFeature',
>> > I want to be able to add workflow and reports on that in my CI system or
>> > test repository.
>> >
>>
>> while being language agnostic ? is that still a requirement ?
>
> I don't think language agnostic can really apply while we're talking
> about the python unittest API.

ok. That's what I thought

> I will want to figure out how to tunnel
> whatever design we come up with through subunit, in both directions -
> e.g. how to represent a custom outcome in junit in python - but thats my
> problem ;).
>

How do they (custom outcomes) work in JUnit ? If that's possible then
perhaps could be the inspiration for another candidate ;o)

>> >  - I want to be able to control whether this outcome should be
>> > considered ordinary or exceptional: should it cause a test suite to
>> > error, or even to stop early?
>> >
>>
>> This interpretation of (pass | fail) may be very controversial and
>> probably we end up with a parallel hierarchy similar to ImportError
>> and ImportWarning
>
> I don't think I'm really interpreting here at all -

I was not talking about you . I was talking about the testing framework and
the test author.

> if you look at all
> the CI tools around, they have varying degrees of sophistication, but
> what they all seem to agree on is that a test run either passes, or does
> not pass.

Confirming what I (and also others considering e.g. the comment about
Pandokia in the other thread ;o) think : Let's keep the three main
outcomes and build custom

> Being able to inform an extensible system as to whether a new
> outcome stops the test run being a pass is pretty important on that
> basis:

That's just an (important) example . I'm sure that somebody will come
up with another behaviors in the future .

> if you cannot inform the system then new outcomes are either
> always going to make the run 'fail', or they are never going to make the
> run 'fail' - and its pretty easy to guarantee someone will be
> unhappy ;).

Yes, I'm configured to be unhappy by default
:P

> As an experiment, imagine that python 2.7's skip, expected
> fail and unexpected success were written as extensions, not as a patch
> to the core. We should be able to do that, with whatever system we come
> up with.
>

I am not very aware of that solution (and I suppose you are just
talking about stopping the test runner) but why not just write (1) a
custom test runner (probably a descendant of unittest.TextTestRunner
;o) or (2) a mixin or (3) something based on the GoF decorator pattern
in order to add such capabilities to the runner itself ? If (3) is
chosen then such capabilities could be added to an existing test
runner at run-time. I selected (1) once upon a time in order to
incorporate warnings (partially, you all know the story ...) to the
runner and test cases . I don't like mixins , but that's definitely
another std-ish approach. And you will always be able to subclass any
test runner and make it work the way you want ;o)

PS: You'll also need to enhance TestResult because it's strongly
coupled to the test runner by design :-/

.. [1] POSIX
        (http://en.wikipedia.org/wiki/POSIX)

.. [2] PythonTestingToolsTaxonomy - Cheesecake - Trac
        (http://pycheesecake.org/wiki/PythonTestingToolsTaxonomy)

.. [3] nose ErrorClass plugins
        (http://somethingaboutorange.com/mrl/projects/nose/0.11.1/plugins/errorclasses.html)

-- 
Regards,

Olemis.

Blog ES: http://simelo-es.blogspot.com/
Blog EN: http://simelo-en.blogspot.com/

Featured article:
Initial selection in a RadioBox - wxPython-users | Google Groups  -
http://feedproxy.google.com/~r/TracGViz-full/~3/7W1YGW2HKuY/f606817e38bdcb02