[TIP] branch coverage

Thu Jan 24 13:11:42 PST 2008

On Jan 24, 2008 7:41 AM, Andrew Dalke <dalke at dalkescientific.com> wrote:
> Hi all,
>
> m h sesquile at gmail.com:
> >  I really don't want to write a python compiler, but
> > am assuming tracing at that level won't be supported in cpython, but
> > perhaps might in pypy....
>
>
> On Jan 24, 2008, at 12:32 PM, Laura Creighton wrote:
> > I think that Andrew Dalke, who is cc'd to this note, has already done
> > a bunch of work which would be very useful to you.
>
> I was talking with Laura a few days ago about a lightning talk I
> wanted to present at PyCon - branch coverage.  There's no easy way to
> do it with Python.  The closest is to use the compiler module to
> generate the AST then instrument the AST.  The problem is, the
> compiler module is a bear to work with and doesn't record everything
> I want.  For example, in the coverage report I want to see which
> branches weren't covered, pinpointed to the character range of the
> expression.  Python's AST doesn't have byte positions.

did you think about using the tokenize module?
http://docs.python.org/lib/module-tokenize.html

"The generator produces 5-tuples with these members: the token type;
the token string; a 2-tuple (srow, scol) of ints specifying the row
and column where the token begins in the source ... and so on"

I believe this is how the 2to3 script builds its AST.  Also, might
want to note that the compiler module has officially been removed from
python 3000.  Sounds like there are plans to replace it with something
comparable, however.

>
> What I've been doing over the last couple of days is getting a PLY
> grammar for Python.  As of last night it parses (and builds the
> trivial concrete syntax tree) of the entire standard library.  After
> the AST works I plan to convert code like this (from line 548 of
> subprocess.py)
>
>      if close_fds and (stdin is not None or stdout is not None or
>                        stderr is not None):
>          raise ValueError("close_fds is not supported on Windows "
>                           "platforms if you redirect stdin/stdout/
> stderr")
>
> into something equivalent to this mess
>
>    __reached_statement(100)  # assuming this is statement number 100
>
>    if close_fds:
>      __branch_is_true(1)  # each branch also gets a unique id, with
>                           # some table mapping that to source file
> and byte range
>
>      if stdin is not None:
>        __branch_is_true(2)
>        __result_bool = True
>        __result_obj = stdin
>      else:
>        __branch_is_false(2)
>
>        if stdout is not None:
>          __branch_is_true(3)
>          __result_bool = True
>        __result_obj = stdout
>        else:
>          __branch_is_false(3)
>
>          if stderr is not None:
>            __branch_is_true(4)
>            __result_bool = True
>            __result_obj = stderr
>          else:
>            __branch_is_false(4)
>            __result_bool = False
>            __result_obj = stderr
>    else:
>      __branch_is_false(1)
>      __result_bool = False
>      __result_obj = close_fds
>
>   if __result_bool:
>          __reached_statement(101)
>          raise ValueError("close_fds is not supported on Windows "
>                           "platforms if you redirect stdin/stdout/
> stderr")
>
> where __branch_is_true and __branch_is_false and __reached_statement
> keep track of which branch points and lines were executed.  (Along
> with a filename, module name, and md5 checksum to prevent version skew.)
>
>
> This horrible if statement expansion mess is needed because it's the
> only way to keep Python guarantees:
>
>    -- short circuiting
>
>    -- the bool check is only done once per term
>
> Otherwise something like
>
>     if _bool_check(1, close_fds) and (_bool_check(2, stdin is not
> None) ... )
>
> would work, where
>
>    def _bool_check(branch_number, obj):
>      if obj:
>        __branch_is_true(branch_number)
>      else:
>        __branch_is_false(branch_number)
>      return obj
>
> This is simple, but it calls bool(obj) twice.
>
>
>    -- the correct object is returned
>
> I had another hack which looked like
>
>      if (_about_to_call(1) and close_fds or is_false(1)) and  (...:
>
> along with some state tracking.  This handles the booleanness
> correctly, but does not return the correct object during assignment
>
>    x = a and (b or c or d)
>
>
> Once I have the modified AST, what should I do with it?
>
> I could generate raw Python code, which would be ugly, have the
> comments stripped out, and line numbers changed.  Or I could generate
> byte code.
>
> If the latter, I was thinking to write a .py -> .pyc compiler, but do
> I use it like compileall?  Or do I generate the .pyc files in another
> directory, which is used for the coverage testing.  Where do I keep
> the coverage results?  Probably all in a single directly, named after
> the Python module name.
>
> Do people only care about if the branch was true/false or are the
> number of tests also important?  What about the number of times a
> line was executed, vs. a flag saying that it was covered?
>
>
>
>
>                                 Andrew
>                                 dalke at dalkescientific.com
>
>
>
> _______________________________________________
> testing-in-python mailing list
> testing-in-python at lists.idyll.org
> http://lists.idyll.org/listinfo/testing-in-python
>