[TIP] branch coverage
Andrew Dalke
dalke at dalkescientific.com
Thu Jan 24 05:41:26 PST 2008
Hi all,
m h sesquile at gmail.com:
> I really don't want to write a python compiler, but
> am assuming tracing at that level won't be supported in cpython, but
> perhaps might in pypy....
On Jan 24, 2008, at 12:32 PM, Laura Creighton wrote:
> I think that Andrew Dalke, who is cc'd to this note, has already done
> a bunch of work which would be very useful to you.
I was talking with Laura a few days ago about a lightning talk I
wanted to present at PyCon - branch coverage. There's no easy way to
do it with Python. The closest is to use the compiler module to
generate the AST then instrument the AST. The problem is, the
compiler module is a bear to work with and doesn't record everything
I want. For example, in the coverage report I want to see which
branches weren't covered, pinpointed to the character range of the
expression. Python's AST doesn't have byte positions.
What I've been doing over the last couple of days is getting a PLY
grammar for Python. As of last night it parses (and builds the
trivial concrete syntax tree) of the entire standard library. After
the AST works I plan to convert code like this (from line 548 of
subprocess.py)
if close_fds and (stdin is not None or stdout is not None or
stderr is not None):
raise ValueError("close_fds is not supported on Windows "
"platforms if you redirect stdin/stdout/
stderr")
into something equivalent to this mess
__reached_statement(100) # assuming this is statement number 100
if close_fds:
__branch_is_true(1) # each branch also gets a unique id, with
# some table mapping that to source file
and byte range
if stdin is not None:
__branch_is_true(2)
__result_bool = True
__result_obj = stdin
else:
__branch_is_false(2)
if stdout is not None:
__branch_is_true(3)
__result_bool = True
__result_obj = stdout
else:
__branch_is_false(3)
if stderr is not None:
__branch_is_true(4)
__result_bool = True
__result_obj = stderr
else:
__branch_is_false(4)
__result_bool = False
__result_obj = stderr
else:
__branch_is_false(1)
__result_bool = False
__result_obj = close_fds
if __result_bool:
__reached_statement(101)
raise ValueError("close_fds is not supported on Windows "
"platforms if you redirect stdin/stdout/
stderr")
where __branch_is_true and __branch_is_false and __reached_statement
keep track of which branch points and lines were executed. (Along
with a filename, module name, and md5 checksum to prevent version skew.)
This horrible if statement expansion mess is needed because it's the
only way to keep Python guarantees:
-- short circuiting
-- the bool check is only done once per term
Otherwise something like
if _bool_check(1, close_fds) and (_bool_check(2, stdin is not
None) ... )
would work, where
def _bool_check(branch_number, obj):
if obj:
__branch_is_true(branch_number)
else:
__branch_is_false(branch_number)
return obj
This is simple, but it calls bool(obj) twice.
-- the correct object is returned
I had another hack which looked like
if (_about_to_call(1) and close_fds or is_false(1)) and (...:
along with some state tracking. This handles the booleanness
correctly, but does not return the correct object during assignment
x = a and (b or c or d)
Once I have the modified AST, what should I do with it?
I could generate raw Python code, which would be ugly, have the
comments stripped out, and line numbers changed. Or I could generate
byte code.
If the latter, I was thinking to write a .py -> .pyc compiler, but do
I use it like compileall? Or do I generate the .pyc files in another
directory, which is used for the coverage testing. Where do I keep
the coverage results? Probably all in a single directly, named after
the Python module name.
Do people only care about if the branch was true/false or are the
number of tests also important? What about the number of times a
line was executed, vs. a flag saying that it was covered?
Andrew
dalke at dalkescientific.com
More information about the testing-in-python
mailing list