[TIP] Mocking properties on an instance

Andrew Dalke dalke at dalkescientific.com
Mon Sep 12 13:04:36 PDT 2011


On Sep 12, 2011, at 1:49 PM, Julian Berman wrote:
> What I would have hoped we'd [python would have] done instead, is to expand the protocol for containers. Something like
> 
>     set(ordered=True, frozen=False)
> 
> along with some methods; freeze/unfreeze for frozen, a keep_sorted for sorting...

I was thinking about this during the preparation of my EuroPython talk on Python's various data types. That is, I wondered why there wasn't an OrderedSet (using that over "sorted" in line with collections.OrderedDict).

My conclusion was that that point of "set" wasn't just that it was a container, but instead was the set methods that you could do.

f you just want the set container nature then you can do:

>>> set = dict.fromkeys
>>> x = set(range(10))
>>> 4 in x
True
>>> 

There's an extra O(N) storage for all those None values, but Michael says we'll never notice the difference. ;)

What sets add over dictionaries is set algebra. I gave an example of an inverted index

>>> import collections
>>> index = collections.defaultdict(set)
>>> for line in open("/usr/share/dict/words"):
...   word = line.strip()
...   for c in word:
...     index[c].add(word)
... 
>>> for word in sorted(set.intersection(*(index[c] for c in "aeiuoyq"))):
...   print word
... 
anthrahydroquinone
coequality
consequentiality
consequentially
diphenylquinomethane
 ...

"intersection" does not work for dictionaries because what is

  {"name": "Andrew"} + {"name": "Dalke"}

? Hence, the essential differences between sets and dictionaries
are the things you can do with sets which you can't do with
dictionaries.


Does it makes sense to support set algebra on ordered sets ?

Not by default! For one, you'll take a hit because something like:

   qu_and_z_words = index["z"].intersection(
           word for word in (index["q"] & index["u"]) if "qu" in word)

goes from amortized linear time to O(n log n), in order to preserve
the insertion order in that generator expression.

Since it's useless in this case to preserve the insertion order,
I really don't want it to be part of the default set behavior.


While it is possible to preserve an insertion ordering with all
of the set operations (with space and time penalties, of course),
how is it useful to have

    z = x^y  # x.symmetric_difference(y)

preserve the insertion ordering? I can't think of a real
use case for set algebra on ordered sets.

Instead, most of the times where I need a ordered set, I
don't need set algebra, and I can handle it with an
OrderedDict, which can be emulated with:

>>> OrderedSet = collections.OrderedDict.fromkeys
>>> OrderedSet( ("one", "two", "three") )
OrderedDict([('one', None), ('two', None), ('three', None)])
>>> x = OrderedSet( ("one", "two", "three") )
>>> y = OrderedSet( ("three", "two", "four") )
>>> x.update(y)
>>> list(x)
['one', 'two', 'three', 'four']
>>> 



				Andrew
				dalke at dalkescientific.com





More information about the testing-in-python mailing list