[TIP] testing and hash values

Sun Sep 29 18:14:29 PDT 2013

On 9/29/13 4:19 PM, Chris Jerdonek wrote:
> On Sun, Sep 29, 2013 at 12:20 PM, Ned Batchelder <ned at nedbatchelder.com> wrote:
>> On 9/29/13 12:56 PM, Chris Jerdonek wrote:
>>> I have a question about the behavior of hashing prior to Python 3.3
>>> (when hash randomization was turned on by default [1]).
>>>
>>> I know that in earlier versions Python never made any guarantees about
>>> hash values and their effect on dictionary key ordering, etc [2].  But
>>> for testing purposes, in practice, to what extent does hashing behave
>>> the same across systems and Python versions prior to Python 3.3?  For
>>> example, the note at [2] says that "it typically varies between 32-bit
>>> and 64-bit builds."
>>>
>>> I'm asking because I'm curious about the extent to which tests that
>>> unknowingly depend on hash values are reproducible across systems and
>>> versions.
>>
>> Tests like that are not reproducible across systems and versions. They may
>> not be reproducible as the product code changes.  Two equal dicts may not
>> iterate in the same order, even within a single process:
> What explains the following then?  For quite a while, the unit tests
> for a project I maintained always passed for all versions, but only
> when I added 3.3 (when hash randomization was enabled) did I get
> intermittent failures on such a test.  Do some such tests tend to
> behave the same *in practice* -- even to a limited extent?  Otherwise,
> I would have expected to see test failures in the earlier versions, at
> least sometime.

Many dictionaries will behave the same across versions and systems. In 
the example I gave below, if the keys were integers instead of strings, 
the two dicts would iterate the same.  It all comes down to the hash 
values of your actual keys.

When I said "tests like that are not reproducible," I didn't mean that 
they would actually behave differently.  I meant that you couldn't count 
on them always behaving the same.

Your test dictionaries happened to fall into a reproducible scenario.  
The fact that they always behaved the same doesn't change the fact: you 
were relying on undefined behavior (the iteration sequence of dict 
keys).  Python 3.3 shook things up enough for it to actually change the 
outcome of your program.

--Ned.

> --Chris
>
>
>>>>> d1 = dict.fromkeys(str(i) for i in range(10))
>>>>> d2 = dict.fromkeys(str(i) for i in range(1000000))
>>>>> for i in range(10, 1000000):
>> ...   del d2[str(i)]
>> ...
>>>>> d1 == d2
>> True
>>>>> d1
>> {'1': None, '0': None, '3': None, '2': None, '5': None, '4': None, '7':
>> None, '6': None, '9': None, '8': None}
>>>>> d2
>> {'9': None, '1': None, '5': None, '2': None, '0': None, '3': None, '4':
>> None, '6': None, '7': None, '8': None}
>>
>> Be careful out there...
>>
>> --Ned.
>>> --Chris
>>>
>>> [1] http://docs.python.org/3/whatsnew/3.3.html#builtin-functions-and-types
>>> [2] http://docs.python.org/2/using/cmdline.html#cmdoption-R
>>>
>>> _______________________________________________
>>> testing-in-python mailing list
>>> testing-in-python at lists.idyll.org
>>> http://lists.idyll.org/listinfo/testing-in-python
>>