[TIP] 5 lines of code equals 30+ lines of single test

Mark Sienkiewicz sienkiew at stsci.edu
Tue Jul 24 13:09:51 PDT 2012


On 07/23/2012 09:03 PM, John Wong wrote:
> Thank you guys. Sorry for the late response. I've been busy with other
> things, but I've been reading comments and books on the related subject.
>
> 1. For this particular project, I use a lot of os and shutil. Most of them
> require "states". For example, os.path.listdir requires a state of the file
> system, so to me it might be a good idea to return a fake list immediately?
> shutil.delete shutil.move will also require verification of the state (the
> inputs). I think these are also valid to mock?

You can mock anything that you want to.  The question is whether it is more cost-effective to mock it or to just use the real thing.  With the mock, you have to create and test the mocked scenario, but you can gain some isolation from the environment and other parts of your system.  That isolation is the point of mocking.


> 2. When thinking pure mocking vs real physical fixture data (make real
> folders and files), which one do people prefer? Is that a style thing or
> actually a real concern?

There is a real concern, but it can be complicated.

I generally use real files.  It is very easy for me to make an empty directory and extract a tar/zip into it before I run my tests.  It is also very easy for me to just provide that data as part of the test.

The questions you have to ask yourself is:  If I mock this, will I spend more time building the mocked scenario than I save by providing a dataset?  How will each approach affect the reliability of my tests?

I normally have ready access to data, both simulated and real, so I don't bother with mocking.

Initializing the test data can make your tests take longer.  I have a continuous integration system that runs tests for several hours overnight, so I literally don't care if I add a new batch of tests that runs for 2 minutes instead of 1 minute.

If you are going to run your tests every time you change a couple lines of code, maybe you care about the difference between 5 seconds and 10 seconds.  That changes the tradeoff of mock vs real files -- if you might spend 15 minutes creating a mock that saves 5 seconds per test run, you win that all back in just 180 instances of running the test.  If you run the tests 20 times a day, you make back your 15 minute investment in just 2 weeks -- after that, you still save another 5 seconds for every test run.

If you try to make your mock too complex, you might write a bug into your test; in that case, you may lose all of your time savings when you debug the test.  You should expect this to happen from time to time whether you use mocks or not; the point is to construct your tests so that you win more than you lose.


b.t.w.  I do not use nose or py.test to provide the test data.  I have a separate script that creates the environment that my tests will run in, then invokes the tests.  This makes it easy to provide a fairly complex environment.  I can also set up the environment without running the tests, then run something else in the same environment as what the tests had.


> 3. Whether with or without mock, should I always write some tests for
> checking exception? For example, reading a file may raise OSError if the
> file does not exist. I don't need to raise exception in my code because the
> library (os module in this case) will raise it in its implementation. But
> for completeness,
>         a) if I mock it, I can use side_effect to create an exception
>         b) without mocking something like open function, I will have to
> supply a non-existing path.
> To me, it seems pointless to catch exception in this particular case... if
> I am not doing anything useful like rollback (delete temp file, for
> example)? Do you guys agree?


Your idea of "pointless .. in this particular case" can be a perfectly valid engineering judgement.  Suppose you have:

def my_open(f,m):
     print "Opening %s in mode %s"%(f,m)
     return open(f,m)

You don't even write a test for that -- you just run it once and see that it works.  But real life usually isn't that simple, and at some point you may decide it is important to know that your function really does raise the exception.


> 4. So are you guys saying that anything that require some states (database,
> writing to file, etc) is okay to mock and check they are called with the
> right parameters? or do I just mock them out without checking their call
> args list?


I would not say that.  Mocking is a convenient way to test without providing REAL external state, but you still have to choose whether/when to use it in a particular test scenario.  Certainly it can be helpful in some cases, but you also want to test that your software can talk to a real database with real data.

If you have good coverage with the real database, you might be satisfied that parameters to the database calls are correct.  In that case, your mock might not bother confirming them.  In that case, I would ask "What are the mocked tests for?"  If you have a good answer that doesn't require those tests to confirm the parameters to the database calls, then you don't need to check the parameters.  If failing to confirm the parameters makes you question the value of the test, then you need to check them.

Mark S.




More information about the testing-in-python mailing list