[TIP] Column questions.
Andrew Dalke
dalke at dalkescientific.com
Thu Jul 24 16:13:37 PDT 2008
Tim Head wrote:
> True. Still it seems awfully shaky, a change in the number of rand()
> calls between you setting the seed and checking the answer of your
> function will make your test fail.
I assume you're not calling rand(3C) but mean that as a synecdoche? ;)
15 years ago we used drand48 but I think the present standard is the
Mersenne Twister.
Yes, it's a mess.
You can minimize it somewhat by using multiple RNGs, for example, one
for initial velocity assignment (if needed) and another for
perturbations during the simulation, but then I worry about subtle
correlations between them. At the very least, don't use the same RNG
for choosing a random display color as you do your monte carlo code.
Here's a subtle question: how do you seed all of the RNGs in
a distributed system? If they all use the same seed then in
the extreme case of random motions applied to a particle system,
with each particle on a different machine, then everything is
going to move rigidly.
I don't know the answer to this question. I came up with
an algorithm that avoided obvious problems with congruential
PRNGs but never proved that it wasn't free of flaws. You
can see it still worries me today.
> Indeed, but I always thought it was a bit dirty to only be able to say
> "well on average our MC generated W and Z bosons have the right mass".
> Ideally I'd like to be able to say that it works always, not just on
> average, there might be some subtle canceling out effect going on,
> plus these statistical tests tend to be a bit tedious to do so maybe
> you dont run them after every late night "tuning up" of the
> algorithms.
I can't see how there will be a general solution. In some cases you
could do the additional statistical analysis to see if your results
are expected. In other cases there are computable stable points, like
putting objets in an L-5 point in a gravitational system, where you
can check to see if it's really stable/orbits the point correctly.
I've done things like insert deliberate errors at the places likely to
be wrong, to check if my checks really catch the error. After working
with code for a while you get a sense of what sorts of errors are
likely in different parts.
Some of the parts can be tested independently from the RNG.
Code walkthroughs can work as well. Checking against other
implementations also can help. A friend of mine checked his
force equations against several other programs and was able
to point out how one of the programs copied that section
of code from another - in violation of license.
In any case, you know the result won't work always. An RNG
could in theory give "0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0" and
that would give you an unexpected answer.
There's plenty of cautionary tales about this. One is
of rocket scientists computing a trajectory. They used
several people and took the consensus, only to find that
more than 50% made the same sign error.
If you're really worried about this, you could implement
your code twice, in two very different ways. That's
how the Pentium FDIV bug was discovered:
http://www.trnicely.net/pentbug/pentbug.html
He used both float division and scaled integer division,
and compared the results.
At one company I was at we implemented a search
algorithm once using a genetic algorithm (so plenty
of random numbers) and once using the full graph search.
The GA solution took two days to implement and gave
useful intermediate results. The graph search took
something like 6 weeks to implement. One of the ways
to prove the latter was correct was it never gave
results worse than the GA.
There's even a few cases where well-known results had
to be withdrawn because of implementation errors that
weren't caught until years later.
What level of certainty do you want? That's also based
on the level of certainty your PhD advisors want, and the
likelihood of publishing something wrong and have it
be found by someone else.
> As an eager follower of his blog I have already looked at these ;)
Did you already listen to the podcast interview?
I don't follow his blog so don't know if he linked to it.
> It makes me unhappy to see that so many people get away with not being
> able to prove that their code has no (obvious) bugs. This certainly is
> my impression at undergraduate level.
If it's of any help, here's some calibration points:
"violates conservation of energy" == "obvious bug"
"subtle canceling out effect going on" == "not obvious bug"
:)
Andrew
dalke at dalkescientific.com
More information about the testing-in-python
mailing list