[TIP] coverage-based test minimization

Sat Jun 5 13:51:15 PDT 2021

Hi all,

  I'm looking for pointers to existing tools, especially ones based on pytest, which help reduce the number of test cases by using code coverage analysis to select a good subset. I haven't heard of any, so though to ask here.

## Background

  We have a data set with 371 molecular structures. Its used in a quite a few of our unit tests, and is one of the reasons the unit tests take a long time to run.

People don't want to get rid of test cases because they are rightly worried how that will reduce code coverage.

My thought is to use coverage-based tools to reduce the number of test cases. That is, use sys.settrace() to log module/function/line/(and opcode?) calls for the relevant modules, run a single test case while recording some unique execution signature for that test case, and reset the trace function to None when done.

I can then determine a small subset of those test cases which still execute all the relevant code paths, by ensuring that all execution signatures remain.

## Demonstration code

I put something together last week which does that. I write the signatures to a file in SMT format to let the Z3 solver find a minimal set coverage. Code at https://hg.sr.ht/~dalke/off_coverage .

Assuming I did everything correctly, we can reduce our test set down to 58 structures.

That's a big assumption. Not only is it barely-tested one-day-old spike code, but I only looked at line coverage. I think a better solution is to handle a 2- or 3-tuple of line execution orders, as that's closer to handling branch coverage.

## Elaboration of my request

Have others used this approach before? Are test tools which use it available? Can anyone here provide feedback about their experience with them?

I don't even have a good name for "coverage-based set-cover minimization of test sets", which uses both "set" and "cover" in two rather different ways. What is it called?

Best regards,

				Andrew
				dalke at dalkescientific.com