[TIP] testing and hash values

John Wong gokoproject at gmail.com
Mon Sep 30 03:34:48 PDT 2013


You might have a better luck posting this to the general Python mailing
list since the majority of people are there.
But do you happen to the the PEP regarding these issues?

It's interesting to hear the Python 3 decision. Though I have also been
warned not to trust the output too in the past even back in 2.6




On Mon, Sep 30, 2013 at 6:19 AM, Ned Batchelder <ned at nedbatchelder.com>wrote:

> On 9/30/13 1:09 AM, Chris Jerdonek wrote:
>
>> On Sun, Sep 29, 2013 at 6:14 PM, Ned Batchelder <ned at nedbatchelder.com>
>> wrote:
>>
>>> On 9/29/13 4:19 PM, Chris Jerdonek wrote:
>>>
>>>> On Sun, Sep 29, 2013 at 12:20 PM, Ned Batchelder <ned at nedbatchelder.com
>>>> >
>>>> wrote:
>>>>
>>>>> On 9/29/13 12:56 PM, Chris Jerdonek wrote:
>>>>>
>>>>>> I have a question about the behavior of hashing prior to Python 3.3
>>>>>> (when hash randomization was turned on by default [1]).
>>>>>>
>>>>>> I know that in earlier versions Python never made any guarantees about
>>>>>> hash values and their effect on dictionary key ordering, etc [2].  But
>>>>>> for testing purposes, in practice, to what extent does hashing behave
>>>>>> the same across systems and Python versions prior to Python 3.3?  For
>>>>>> example, the note at [2] says that "it typically varies between 32-bit
>>>>>> and 64-bit builds."
>>>>>>
>>>>>> I'm asking because I'm curious about the extent to which tests that
>>>>>> unknowingly depend on hash values are reproducible across systems and
>>>>>> versions.
>>>>>>
>>>>>
>>>>> Tests like that are not reproducible across systems and versions. They
>>>>> may
>>>>> not be reproducible as the product code changes.  Two equal dicts may
>>>>> not
>>>>> iterate in the same order, even within a single process:
>>>>>
>>>> What explains the following then?  For quite a while, the unit tests
>>>> for a project I maintained always passed for all versions, but only
>>>> when I added 3.3 (when hash randomization was enabled) did I get
>>>> intermittent failures on such a test.  Do some such tests tend to
>>>> behave the same *in practice* -- even to a limited extent?  Otherwise,
>>>> I would have expected to see test failures in the earlier versions, at
>>>> least sometime.
>>>>
>>>
>>> Many dictionaries will behave the same across versions and systems. In
>>> the
>>> example I gave below, if the keys were integers instead of strings, the
>>> two
>>> dicts would iterate the same.  It all comes down to the hash values of
>>> your
>>> actual keys.
>>>
>>> When I said "tests like that are not reproducible," I didn't mean that
>>> they
>>> would actually behave differently.  I meant that you couldn't count on
>>> them
>>> always behaving the same.
>>>
>>> Your test dictionaries happened to fall into a reproducible scenario.
>>>  The
>>> fact that they always behaved the same doesn't change the fact: you were
>>> relying on undefined behavior (the iteration sequence of dict keys).
>>>  Python
>>> 3.3 shook things up enough for it to actually change the outcome of your
>>> program.
>>>
>> Hmm, there's a subtle distinction here.  There's a difference between
>> the questions of whether equal dictionaries iterate to the same order
>> and whether the same code yields the same values on different systems
>> and versions.
>>
>> For example, I ran your sample code on Python 2.7, 3.2, 3.3 (on 3.3
>> setting PYTHONHASHSEED=0), and PyPy 1.9, and all yielded the same
>> representations of d1 and d2.  So if a test depended on that ordering
>> of d2, it seems like it would be reproducible across versions at least
>> for those versions.
>>
>> To rephrase my question, I'm asking to what extent code can be
>> expected or guaranteed to behave the same if the hash seed is the same
>> (as it was by default prior to 3.3).  Can you provide an example of
>> code that behaves differently on some systems or versions?
>>
>
> Using my same code example, but printing the keys more compactly, I get
> this:
>
>    $ python2.4 dictorder.py
>    d1: 1,0,3,2,5,4,7,6,9,8
>    d2: 9,2,1,5,0,3,4,6,7,8
>    $ python2.5 dictorder.py
>    d1: 1,0,3,2,5,4,7,6,9,8
>    d2: 9,1,5,2,0,3,4,6,7,8
>    $ jython dictorder.py
>    d1: 8,1,7,5,0,2,6,3,9,4
>    d2: 0,8,5,1,7,6,4,2,9,3
>
> I don't know what changed between 2.4 and 2.5, I just know that I wasn't
> allowed to complain about it.  Every version since 2.5 has agreed with 2.5.
>  Jython of course is a completely different beast. Dict iteration order
> isn't guaranteed.
>
> --Ned.
>
>
>> I'm asking because if a test is flaky because of its dependence on
>> hash values, it would be useful to know that knowing the hash seed is
>> enough to reliably reproduce the test failure (e.g. if some user or CI
>> server encountered a sporadic failure).  Otherwise, reproducing the
>> failure would be harder.
>>
>> --Chris
>>
>>
>>
>>  --Ned.
>>>
>>>
>>>  --Chris
>>>>
>>>>
>>>>  d1 = dict.fromkeys(str(i) for i in range(10))
>>>>>>>> d2 = dict.fromkeys(str(i) for i in range(1000000))
>>>>>>>> for i in range(10, 1000000):
>>>>>>>>
>>>>>>> ...   del d2[str(i)]
>>>>> ...
>>>>>
>>>>>> d1 == d2
>>>>>>>>
>>>>>>> True
>>>>>
>>>>>> d1
>>>>>>>>
>>>>>>> {'1': None, '0': None, '3': None, '2': None, '5': None, '4': None,
>>>>> '7':
>>>>> None, '6': None, '9': None, '8': None}
>>>>>
>>>>>> d2
>>>>>>>>
>>>>>>> {'9': None, '1': None, '5': None, '2': None, '0': None, '3': None,
>>>>> '4':
>>>>> None, '6': None, '7': None, '8': None}
>>>>>
>>>>> Be careful out there...
>>>>>
>>>>> --Ned.
>>>>>
>>>>>> --Chris
>>>>>>
>>>>>> [1]
>>>>>> http://docs.python.org/3/**whatsnew/3.3.html#builtin-**
>>>>>> functions-and-types<http://docs.python.org/3/whatsnew/3.3.html#builtin-functions-and-types>
>>>>>> [2] http://docs.python.org/2/**using/cmdline.html#cmdoption-R<http://docs.python.org/2/using/cmdline.html#cmdoption-R>
>>>>>>
>>>>>> ______________________________**_________________
>>>>>> testing-in-python mailing list
>>>>>> testing-in-python at lists.idyll.**org<testing-in-python at lists.idyll.org>
>>>>>> http://lists.idyll.org/**listinfo/testing-in-python<http://lists.idyll.org/listinfo/testing-in-python>
>>>>>>
>>>>>
>>>>>
>
> ______________________________**_________________
> testing-in-python mailing list
> testing-in-python at lists.idyll.**org <testing-in-python at lists.idyll.org>
> http://lists.idyll.org/**listinfo/testing-in-python<http://lists.idyll.org/listinfo/testing-in-python>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.idyll.org/pipermail/testing-in-python/attachments/20130930/dd79959d/attachment-0001.html>


More information about the testing-in-python mailing list