[TIP] testing and hash values

Mon Sep 30 03:36:11 PDT 2013

On Sun, Sep 29, 2013 at 09:56:51AM -0700, Chris Jerdonek wrote:
> I have a question about the behavior of hashing prior to Python 3.3
> (when hash randomization was turned on by default [1]).
> 
> I know that in earlier versions Python never made any guarantees about
> hash values and their effect on dictionary key ordering, etc [2].  But
> for testing purposes, in practice, to what extent does hashing behave
> the same across systems and Python versions prior to Python 3.3?  For
> example, the note at [2] says that "it typically varies between 32-bit
> and 64-bit builds."
> 
> I'm asking because I'm curious about the extent to which tests that
> unknowingly depend on hash values are reproducible across systems and
> versions.

When 64-bit Linux systems first showed up, they exposed bugs in the test
suite of our application where tests were inadvertently relying on hash
values.

This was on Python 2.5 or 2.6.  Specifically, if hash(some_string) would
comoute a 32-bit hash value with the high bit set, 32-bit Pythons would
interpret it as a negative int, while 64-bit Pythons would interpret it
as a positive int.  (IIRC.)

This had a bunch of knock-on effects, like different dict iteration
order when a dict contains string keys, and different random numbers
from random.Random('seeded by some string value').

Needless to say, we fixed our tests.

Incidentally, do not rely on random.Random(some_fixed_seed) for
deterministic test data sequences.  It's not just hash differences --
the algorithms for randrange() and choice() have changed in Python 3.x.

Marius Gedminas
-- 
Every nonempty totally-disconnected perfect compact metric space is
homeomorphic to the Cantor set.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 190 bytes
Desc: Digital signature
URL: <http://lists.idyll.org/pipermail/testing-in-python/attachments/20130930/4b55baae/attachment.pgp>