[TIP] testing and hash values

Sun Sep 29 22:09:51 PDT 2013

On Sun, Sep 29, 2013 at 6:14 PM, Ned Batchelder <ned at nedbatchelder.com> wrote:
> On 9/29/13 4:19 PM, Chris Jerdonek wrote:
>> On Sun, Sep 29, 2013 at 12:20 PM, Ned Batchelder <ned at nedbatchelder.com>
>> wrote:
>>> On 9/29/13 12:56 PM, Chris Jerdonek wrote:
>>>>
>>>> I have a question about the behavior of hashing prior to Python 3.3
>>>> (when hash randomization was turned on by default [1]).
>>>>
>>>> I know that in earlier versions Python never made any guarantees about
>>>> hash values and their effect on dictionary key ordering, etc [2].  But
>>>> for testing purposes, in practice, to what extent does hashing behave
>>>> the same across systems and Python versions prior to Python 3.3?  For
>>>> example, the note at [2] says that "it typically varies between 32-bit
>>>> and 64-bit builds."
>>>>
>>>> I'm asking because I'm curious about the extent to which tests that
>>>> unknowingly depend on hash values are reproducible across systems and
>>>> versions.
>>>
>>>
>>> Tests like that are not reproducible across systems and versions. They
>>> may
>>> not be reproducible as the product code changes.  Two equal dicts may not
>>> iterate in the same order, even within a single process:
>>
>> What explains the following then?  For quite a while, the unit tests
>> for a project I maintained always passed for all versions, but only
>> when I added 3.3 (when hash randomization was enabled) did I get
>> intermittent failures on such a test.  Do some such tests tend to
>> behave the same *in practice* -- even to a limited extent?  Otherwise,
>> I would have expected to see test failures in the earlier versions, at
>> least sometime.
>
>
> Many dictionaries will behave the same across versions and systems. In the
> example I gave below, if the keys were integers instead of strings, the two
> dicts would iterate the same.  It all comes down to the hash values of your
> actual keys.
>
> When I said "tests like that are not reproducible," I didn't mean that they
> would actually behave differently.  I meant that you couldn't count on them
> always behaving the same.
>
> Your test dictionaries happened to fall into a reproducible scenario.  The
> fact that they always behaved the same doesn't change the fact: you were
> relying on undefined behavior (the iteration sequence of dict keys).  Python
> 3.3 shook things up enough for it to actually change the outcome of your
> program.

Hmm, there's a subtle distinction here.  There's a difference between
the questions of whether equal dictionaries iterate to the same order
and whether the same code yields the same values on different systems
and versions.

For example, I ran your sample code on Python 2.7, 3.2, 3.3 (on 3.3
setting PYTHONHASHSEED=0), and PyPy 1.9, and all yielded the same
representations of d1 and d2.  So if a test depended on that ordering
of d2, it seems like it would be reproducible across versions at least
for those versions.

To rephrase my question, I'm asking to what extent code can be
expected or guaranteed to behave the same if the hash seed is the same
(as it was by default prior to 3.3).  Can you provide an example of
code that behaves differently on some systems or versions?

I'm asking because if a test is flaky because of its dependence on
hash values, it would be useful to know that knowing the hash seed is
enough to reliably reproduce the test failure (e.g. if some user or CI
server encountered a sporadic failure).  Otherwise, reproducing the
failure would be harder.

--Chris

>
> --Ned.
>
>
>> --Chris
>>
>>
>>>>>> d1 = dict.fromkeys(str(i) for i in range(10))
>>>>>> d2 = dict.fromkeys(str(i) for i in range(1000000))
>>>>>> for i in range(10, 1000000):
>>>
>>> ...   del d2[str(i)]
>>> ...
>>>>>>
>>>>>> d1 == d2
>>>
>>> True
>>>>>>
>>>>>> d1
>>>
>>> {'1': None, '0': None, '3': None, '2': None, '5': None, '4': None, '7':
>>> None, '6': None, '9': None, '8': None}
>>>>>>
>>>>>> d2
>>>
>>> {'9': None, '1': None, '5': None, '2': None, '0': None, '3': None, '4':
>>> None, '6': None, '7': None, '8': None}
>>>
>>> Be careful out there...
>>>
>>> --Ned.
>>>>
>>>> --Chris
>>>>
>>>> [1]
>>>> http://docs.python.org/3/whatsnew/3.3.html#builtin-functions-and-types
>>>> [2] http://docs.python.org/2/using/cmdline.html#cmdoption-R
>>>>
>>>> _______________________________________________
>>>> testing-in-python mailing list
>>>> testing-in-python at lists.idyll.org
>>>> http://lists.idyll.org/listinfo/testing-in-python
>>>
>>>
>