Add a more idiomatic way to define hashing
The current way of using True/False/None is confusing even to me.
I propose we use an Enum (probably just class on legacy Python) a la:
class Hashing(Enum):
SMART = "smart" # None → new default
BY_ID = "by_id" # False
WRITE_FOR_ALL = "write_for_all" # True
WRITE_FOR_NONE = "write_for_none" # True, but the default on attr.ibs is False
The last two names are terrible of course.
I probably could be talked into depending on https://pypi.org/project/enum34/ on legacy Python.
Opinions?
P.S. Now while typing this out I realized that this could solve a long-standing complaint both I and many users have: assuming you want an attrs-generated method based on a small subset of attr.ibs, you have to write a lot of hash|init|repr=False. Ditching bools for enums would allow for that. 🤔 But let’s talk about that in a different ticket.
We should definitely do something to make the hashing behavior clearer.
I'm currently working on a PR to "fix" slots hashing behavior, and attr.s(hash=False) is really throwing me off.
The docs say nothing about what should happen when hash=False.
My personal understanding would be for attrs to do exactly nothing, we're saying do not touch __hash__. However this is complicated. I will demonstrate.
Assume a very simple class, with hashing disabled:
@attr.s(hash=False)
class A:
a = attr.ib()
The equivalent of this class is: https://gist.github.com/Tinche/d8c1dadfaa3aa4618b736a61e752b3e1
Try pasting them both. The attrs one will be hashable, due to falling back to __object__.__hash__. The normal one will not be.
This is because in the normal class, the interpreter will see the __cmp__ and insert __hash__ = None. In the attrs class, the __cmp__ method isn't there when the class is defined; we put it into the class after the fact. So the interpreter won't touch __hash__ and it will fall through to object.
This is the root cause of why our slots behavior is different.
Yeah so that’s settled.
Ideas for better names for WRITE_FOR_ALL/NONE? @glyph? @njsmith?
P.S. the replacement/slots thingie explains a lot of my own confusion.
What functionalities are we aiming for exactly?
I'm guessing:
- make hashable by generating a hash function based on attributes
- make hashable, but attributes are opt-in instead of opt-out
- make hashable by using identity-based hashing
- make unhashable
- delegate to an existing
self.__hash__, whatever that may be
(edit) Also:
- smart
I don't really understand the original post, but regarding the cases that need to be supported: I think it helps narrow things down to remember that for any given __eq__ there are exactly two valid ways of defining __hash__. Either it must include exactly the same things that go in the __eq__ calculation, or else you have to set __hash__ = None. This is just a fact about the language; if you violate it then dicts stop working.
So if attrs is generating its own __eq__, then the only meaningful decision is "do you want this object to be hashable?", and right now the three answers are "yes" (True), "no" (False), and "guess" (None, meaning yes for immutable objects and no otherwise). (Except apparently the "no" case is buggy? I'm pretty sure that the intention was for cmp=True, hash=False to make an unhashable class.) And then on a per attribute level you can also say yes/no/guess, but I can't think of any time that "guess" is wrong; the other options just give you interesting ways to break your class.
And if attrs isn't generating its own __eq__, then not sure why it would ever want to generate a __hash__. If the user is taking responsibility for __eq__ then we should stay out of the way and give them the regular python behavior imo.
That said, I can see the case for having nicer shorthands for controlling which attribs go into the __eq__ calculation, like toggling it between opt-in and opt-out. But there's no reason to give __hash__ a separate set of toggles here; it should follow what __eq__ does.
Actually, you specifically asked for being able to fall back to hashing by ID – that’s why it’s in there. :) And I think that’s a great and useful feature, so I’m not trying to push the blame in your direction – just reminding. :)
The point of this ticket isn’t really to invent new behaviors, it’s just making the existing ones more understandable. Especially because one of the happened by accident (see also #205).
But maybe…how about this:
class Hashing(Enum):
SMART = "smart" # None → new default
UNHASHABLE = "unhashable" # same as None for unfrozen
BY_ID = "by_id" # False
BY_ATTRS = "opt_out" # True
BY_ATTRS_OPT_IN = "opt_in" # True, but the default on attr.ibs is False
Falling back to hashing by id is definitely useful! If, and only if we are also falling back to doing __eq__ by id. That falls under my line about "If the user is taking responsibility for __eq__ then we should stay out of the way and give them the regular python behavior"
Yeah so the fallback kind of happened by accident (cf. Tin’s explanations) but this time we have the chance to fix it properly including the deprecation year if we switch to enums.
The thing where cmp=True, hash=False ends up with a generated __eq__ combined with __hash__-by-id just sounds like a straightforward bug to me. It just makes it so you can have two objects that compare equal but if you put them both into the same set then it's literally random whether you end up with 1 object added or 2 objects added.
Beyond that I'm mostly just kinda lost about what all these enums are supposed to be or what the overall goal is here. I think I showed up in the middle of the conversation :-)
So I guess we have to break bw-compat again and set __hash__ to None by hand if __eq__ is set? :|
So I guess we have to break bw-compat again and set hash to None by hand if eq is set? :|
Woah there, Tex. Ease your finger up off that trigger :-).
In this case I'm not convinced a breaking change is necessary, because there's no alternate behavior to suggest: a warning would be sufficient. The user has explicitly asked for nonsense, and if the nonsense appears to work today then it's fine to deprecate it for a while before raising an error.
Well the problem (which Tin explained elsewhere, I'm gonna add here for posterity) is: normal Python class behavior is setting dunder hash to None, if there's an Dundee eq. If not, it inherits object's by-id semantics. Now our eq is added after python makes this decision (except when using slots) so object's hash remains.
If slots is true it behaves differently because we build the class differently and it becomes unhashable.
IOW
- we have conflicting behavior between slotted and not,
- and hash=False doesn't actually mean what it looks like (Don't do anything about hash.) because of a weird side-effect.
Dunno. :|
Now our eq is added after python makes this decision (except when using slots) so object's hash remains
Oh, that's deeply unfortunate. I was wondering if the modifying-the-class-after-the-fact thing would come to bite us at some point; looks like it has. Perhaps we should be throwing away the first type and replacing it with a fresh one? (Although that's probably some even worse / more horrible compatibility break than this...)
and hash=False doesn't actually mean what it looks like (Don't do anything about hash.) because of a weird side-effect.
Focusing first on getting the exact right behavior with the new, good enum would be the right way to prioritize this. If we need another compat break (and gosh I hope not) then that's a separate thing.
JFTR, I started to work on this and ran into another “interesting” fact: the behavior Nathaniel described is specific to Python 3. Python 2 will happily hash by ID if there's a __eq__.