python-sortedcontainers icon indicating copy to clipboard operation
python-sortedcontainers copied to clipboard

Move ValueSortedDict into SortedContainers (was: Support for SortedBidict)

Open jab opened this issue 9 years ago • 7 comments

Hi Grant,

I'm the author of bidict. I just came across SortedContainers and it looks excellent. Kudos on the great work! I'm delighted to have found another high-quality Python collections library, and look forward to studying it further.

In the meantime, thanks to finding this project, I've already made a tiny change to the development version of bidict that I'm excited about: now users can make a SortedBidict type (backed by your SortedDict) in just a few lines of code:

>>> import bidict, sortedcontainers
>>>
>>> class sortedbidict(bidict.bidict):
...     _dict_class = sortedcontainers.SortedDict
>>>
>>> b = sortedbidict({'Tokyo': 'Japan', 'Cairo': 'Egypt', 'Lima': 'Peru'})
>>>
>>> list(b.items())  # b stays sorted by its keys
[('Cairo', 'Egypt'), ('Lima', 'Peru'), ('Tokyo', 'Japan')]
>>>
>>> list(b.inv.items())  # b.inv stays sorted by *its* keys (b's values!)
[('Egypt', 'Cairo'), ('Japan', 'Tokyo'), ('Peru', 'Lima')]
>>>
>>> b.index('Lima')  # passes through to SortedDict's implementation
1
>>> b.iloc[-1]
'Tokyo'
>>> b.peekitem(index=-1)
('Tokyo', 'Japan')
>>> b.inv.peekitem(index=-1)
('Peru', 'Lima')
>>> b.popitem(last=False)
('Cairo', 'Egypt')
>>> list(b.items())
[('Lima', 'Peru'), ('Tokyo', 'Japan')]
>>> list(b.inv.items())
[('Japan', 'Tokyo'), ('Peru', 'Lima')]

Documented more here.

Always interested to hear more ideas for fruitful interop and in collaboration in general. In case of interest!

Thanks for reading and best wishes.

Josh

P.S. Hope you don't mind my using your tracker for this. Please of course feel free to close / migrate to anywhere else you prefer. bidict has a (mostly inactive) channel at https://gitter.im/jab/bidict in case you'd ever like to chat there.

jab avatar Dec 07 '16 04:12 jab

Hi jab! Issue tracker is fine. Interesting use case. Thanks for implementing support and writing up the recipe. I'm impressed the extra methods pass through to SortedDict e.g. index and iloc. Does it work for the KeysView too?

grantjenks avatar Dec 07 '16 18:12 grantjenks

Thanks for implementing support and writing up the recipe.

My pleasure, thanks for your interest and the quick reply!

I'm impressed the extra methods pass through to SortedDict e.g. index and iloc.

Oh, that's because I actually added fallback attribute proxying specially with this change (see the __getattribute__ method override here). Thought that might be a nice complement to the main _dict_class change, curious what you think.

Does it work for the KeysView too?

You can call bidict.keys() (bidict.viewkeys() on Python 2.7) and get back a collections.abc.KeysView, but that's been the case since before this change, and is not because of the new __getattribute__ override, but because bidict.BidirectionalMapping inherits from collections.abc.Mapping, and so bidict.keys resolves to Mapping's keys() implementation before it has a chance to fall back to SortedDict's in the new __getattribute__ logic.

So here's what it looks like:

>>> sortedbidict().keys()
KeysView(sortedbidict())
>>> sortedbidict().keys().__class__
collections.abc.KeysView
>>> frozenbidict().keys()
KeysView(frozenbidict())
>>> orderedbidict().keys()
KeysView(orderedbidict())
>>> # etc.

My biggest concern with this _dict_class idea is that it might make users take for granted that they should be able to extend bidict with extra functionality with such simple compositions, but the reality is that things aren't always this easy. For example, what if a user wants a SortedBidict whose inverse's items have the same order as it?

>>> elementByAtomicNumber = SortedBidict({1: 'hydrogen', 2: 'helium'})
>>> list(elementByAtomicNumber.items())
[(1, 'hydrogen'), (2, 'helium')]
>>> list(elementByAtomicNumber.inv.items())  # should remain sorted by atomic number, but isn't
[('helium', 2), ('hydrogen', 1)]

If SortedDict supported sorting by value instead of by key, or even if it allowed specifying a key function that gets passed both the key and value of each item, then maybe the minimal _dict_class change could still enable this kind of extension simply (though this would mean SortedDict's key function argument would no longer work like the one that Python's built-in sorted function takes). I'm not sure how much sense it would make for SortedDict to add that change, but even if it did, I'm still concerned about over-promising extensibility with a change that has not been designed for any actual users' real use cases. Another data point is that bidict.orderedbidict originally made use of a similar mechanism to enable maintaining insertion order (i.e. it just overrode the internal dict class to be collections.OrderedDict instead of dict, and so was a super simple change), but this resulted in insertion order not being preserved when overwriting items in some cases. And because OrderedDict doesn't expose its backing linked list, to fix this I had to essentially reimplement OrderedDict by hand. So, another case where this simple _dict_class change fell short of enabling insanely easy extensibility.

Would love to know what you think about all this, and whether anything else occurs to you about how disparate collection implementations like ours could interoperate better and be designed to compose more effectively.

jab avatar Dec 07 '16 23:12 jab

Take a look at Sorted Collections: http://www.grantjenks.com/docs/sortedcollections/ that's kind of a child project of Sorted Containers. I like to collect recipes from Issues there. The ValueSortedDict and ItemSortedDict recipes are in that repo too.

grantjenks avatar Dec 08 '16 00:12 grantjenks

Thanks, that's super helpful! After replacing _dict_class with two separate _fwd_class and _inv_class attributes, look what I can do with _inv_class = ValueSortedDict (the sortedbidict2 example):

>>> import bidict, sortedcontainers
>>>
>>> # a sorted bidict whose forward items stay sorted by their keys,
>>> # and whose inverse items stay sorted by *their* keys (i.e. it and
>>> # its inverse iterate over their items in different orders):
>>>
>>> class sortedbidict1(bidict.OrderedMapping, bidict.bidict):
...     _fwd_class = sortedcontainers.SortedDict
...     _inv_class = sortedcontainers.SortedDict
>>>
>>> b = sortedbidict1({'Tokyo': 'Japan', 'Cairo': 'Egypt', 'Lima': 'Peru'})
>>>
>>> list(b.items())  # b stays sorted by its keys
[('Cairo', 'Egypt'), ('Lima', 'Peru'), ('Tokyo', 'Japan')]
>>>
>>> list(b.inv.items())  # b.inv stays sorted by *its* keys (b's values!)
[('Egypt', 'Cairo'), ('Japan', 'Tokyo'), ('Peru', 'Lima')]
>>>
>>> # a sorted bidict whose forward items stay sorted by their keys,
>>> # and whose inverse items stay sorted by their values (i.e. it and
>>> # its inverse iterate over their items in the same order):
>>>
>>> import sortedcollections
>>>
>>> class sortedbidict2(bidict.OrderedMapping, bidict.bidict):
...     _fwd_class = sortedcontainers.SortedDict
...     _inv_class = sortedcollections.ValueSortedDict
>>>
>>> elemByAtomicNum = sortedbidict2({1: 'hydrogen', 2: 'helium', 3: 'lithium'})
>>>
>>> # forward items sorted by key:
>>> list(elemByAtomicNum.items())
[(1, 'hydrogen'), (2, 'helium'), (3, 'lithium')]
>>>
>>> # inverse items sorted by value:
>>> list(elemByAtomicNum.inv.items())
[('hydrogen', 1), ('helium', 2), ('lithium', 3)]
>>>
>>> elemByAtomicNum.update({4: 'beryllium'})
>>> list(elemByAtomicNum.inv.items())
[('hydrogen', 1), ('helium', 2), ('lithium', 3), ('beryllium', 4)]
>>>
>>> # order is preserved correctly when passing .inv back into constructor:
>>> atomicNumByElem = sortedbidict2(elemByAtomicNum.inv)
>>> list(atomicNumByElem.items()) == list(elemByAtomicNum.inv.items())
True
>>> # copies of .inv preserve order correctly too:
>>> list(elemByAtomicNum.inv.copy().items()) == list(atomicNumByElem.items())
True
>>>
>>> # attrs not defined by bidict are passed through to the _fwd SortedDict:
>>> elemByAtomicNum.peekitem(index=0)
(1, 'hydrogen')
>>> elemByAtomicNum.popitem(last=False)
(1, 'hydrogen')
>>> elemByAtomicNum.inv.popitem(last=True)
('beryllium', 4)
>>>
>>> # bidict provides an OrderedMapping ABC for interoperability.
>>> # Its __eq__ method is order-sensitive when comparing against another
>>> # OrderedMapping. If we want our sortedbidicts to have order-sensitive
>>> # comparison, we can inherit from OrderedMapping before inheriting
>>> # from bidict, so OrderedMapping's __eq__ comes first in the MRO.
>>> # Since this is what we did above, the following works:
>>> elemByAtomicNum == bidict.orderedbidict([(2, 'helium'), (3, 'lithium')])
True
>>> elemByAtomicNum != bidict.orderedbidict([(3, 'lithium'), (2, 'helium')])
True
>>> # comparing against an unordered Mapping is order-insensitive:
>>> elemByAtomicNum == {3: 'lithium', 2: 'helium'}
True
>>> # Since bidict.OrderedMapping implements __subclasshook__, classes like
>>> # SortedDict and OrderedDict are detected as subclasses automatically,
>>> # and so the following works:
>>> issubclass(sortedcontainers.SortedDict, bidict.OrderedMapping)
True
>>> import collections
>>> issubclass(collections.OrderedDict, bidict.OrderedMapping)
True
>>> elemByAtomicNum == collections.OrderedDict([(2, 'helium'), (3, 'lithium')])
True
>>> elemByAtomicNum != collections.OrderedDict([(3, 'lithium'), (2, 'helium')])
True
>>> # If we wanted our sortedbidicts to be automatically picked up as
>>> # subclasses of Python 3.6's collections.abc.Reversible, we could
>>> # add a __reversed__ class attribute that just proxies the method
>>> # call to the _fwd SortedDict:
>>> class sortedbidict3(bidict.OrderedMapping, bidict.bidict):
...     _fwd_class = _inv_class = sortedcontainers.SortedDict
...     __reversed__ = lambda self: reversed(self._fwd)
>>>
>>> # Python 3.6:
>>> issubclass(sortedbidict3, collections.Reversible)  
True
>>> # bidict also provides an OrderedBidirectionalMapping ABC with a
>>> # __subclasshook__, so with the __reversed__ method implying ordering,
>>> # the following works automatically too:
>>> issubclass(sortedbidict3, bidict.OrderedBidirectionalMapping)
True

(Implemented in https://github.com/jab/bidict/commit/2ce0153, and now documented at https://bidict.readthedocs.io/en/dev/other-bidict-types.html#extending-bidict as well.)

It was a little tricky to get all the corner cases right, but I think it's shaping up. Would love a sanity check though. What do you think?

jab avatar Dec 08 '16 05:12 jab

Looks good to me. You might want to mention sortedcollections alongside sortedcontainers in the docs.

This is also motivating to consider including ValueSortedDict within SortedContainers, possibly with a faster implementation.

grantjenks avatar Dec 09 '16 00:12 grantjenks

Thanks for taking a look! I added a mention of sortedcollections alongside sortedcontainers, as you suggested. Now live at https://bidict.readthedocs.io/en/dev/other-bidict-types.html#extending-bidict. I also made some more changes to the examples, and updated the comment above with the latest. https://github.com/jab/bidict/commit/2ce0153 is the current latest revision.

Including ValueSortedDict right in SortedContainers would be nice. It'd make this recipe that much shorter to boot :)

jab avatar Dec 09 '16 04:12 jab

+1 on adding ValueSortedDict to the "core" library

laserson avatar Sep 03 '17 17:09 laserson

I think the decision has been made (perhaps through neglect) to leave ValueSortedDict in sortedcollections.

There is a separate issue to mention that library in the docs.

grantjenks avatar Aug 20 '22 15:08 grantjenks