ceylon icon indicating copy to clipboard operation
ceylon copied to clipboard

Auto-generating equals and hash based on annotated attributes

Open CeylonMigrationBot opened this issue 11 years ago • 40 comments

[@lucaswerkmeister] I thought I’d open a dedicated issue for @gavinking’s suggestion here:

I would like to have a system based on annotations that produces equals() and hash at once.

If I understand that correctly, you would annotate some attributes identifying (or a different name), and you would get equals() and hash that used equals() / hash of all attributes annotated identifying.

Open questions:

  1. Is this implemented by the compiler? It’s probably possible to just make this the behavior of Object.equals() and Object.hash using the metamodel.
  2. If it’s generated by the compiler: What if there are identifying attributes, but equals() or hash was also overridden?
    1. Always disallow that. Possibly inconvenient when you inherit a class with identifying attributes, but want to implement your own equals().
    2. Only allow it if both equals() and hash are overridden, make it an error to override only one of them.
    3. Allow it. If only one is overridden, keep the “clever” version of the other.
    4. Allow it. If only one is overridden, fall back to the “dumb” version of the other.

To make any sense, this probably requires identifying to be an inherited annotation. But in #4020 @FroMage said that actual members can’t inherit annotations :-(

[Migrated from ceylon/ceylon-spec#971]

CeylonMigrationBot avatar May 26 '14 15:05 CeylonMigrationBot

[@tombentley] I would expect a refined equals() or hash in the presence of identifying to be an error.

I would also expect the compiler to generate the equals() and hash implementations. Ideally the order of the attributes involved would be something the programmer could convey (I guess declaration order would be a sufficient hint), take for example:

class OrderMatters(int, list) {
    identifying Integer int;
    identifying List<String> list;
}

For equals() we really want to do int == other.int && list == other.list rather than list == other.list && int == other.int in order to avoid doing the more expensive == if possible.

The obvious extension would be ordering or comparing to generate the compare() method of Comparable. Then the typechecker could warn about/reject Comparable being inconsistent with equals().

CeylonMigrationBot avatar May 26 '14 16:05 CeylonMigrationBot

[@lucaswerkmeister]

The obvious extension would be ordering or comparing to generate the compare() method of Comparable. Then the typechecker could warn about/reject Comparable being inconsistent with equals().

The bad case is “two elements are equals(), but not equal”. To prevent this, there must not be an attribute that’s comparing but not identifying.

I’m tempted to put these into a single annotation with a boolean parameter – identifying { ordering = true; } – but that seems less readable than identifying ordering.

(We could also say that ordering implies identifying, but that’s probably even more confusing.)

CeylonMigrationBot avatar May 26 '14 16:05 CeylonMigrationBot

[@lucaswerkmeister] @gavinking mentioned that string should also be auto-generated. How would the resulting string look like? (If the identifying attributes were exactly the initializer parameters, then ClassName(attr1.string, attr2.string) would make sense, but we’re currently not enforcing that.)

CeylonMigrationBot avatar Jun 10 '14 14:06 CeylonMigrationBot

[@quintesse] I don't think it should necessarily look like an instantiation, so maybe we could go for something that obviously ain't, let's say ClassName [foo, bar]. I'd also go for named parameters: ClassName [attr1=foo, attr2=bar]. If there are no attributes just plain ClassName and objectName (nice for enumerations). Also we could add maybe an argument to the annotation to filter out attributes we don't want to appear in the string. And finally we should probably put a cap on the length of the values, like ClassName [attr1=foo, attr2=blahblahblahblahblahblahblahblah...]. Strings are for human consumption, people should not be tempted to try and use them as a kind of serialization.

CeylonMigrationBot avatar Jun 10 '14 15:06 CeylonMigrationBot

[@lucaswerkmeister]

ClassName [attr1=foo, attr2=bar]

Looks good

If there are no attributes

Then how do we even know that we’re supposed to auto-generate string? When there’s no identifying annotation anywhere?

And finally we should probably put a cap on the length of the values

Agreed, like Iterable.string limits itself to 30 elements (although it doesn’t limit the length of the individual elements).

CeylonMigrationBot avatar Jun 10 '14 15:06 CeylonMigrationBot

[@DiegoCoronel] Hi,

Lombok it out for a long time, maybe theres something you can learn from them:

@ToString: http://projectlombok.org/features/ToString.html @EqualsAndHashCode: http://projectlombok.org/features/EqualsAndHashCode.html (this is interesting) All annotations: http://projectlombok.org/features/index.html

This is just to try to help with some ideas.

2014-06-10 11:16 GMT-04:00 Lucas Werkmeister [email protected]:

ClassName [attr1=foo, attr2=bar]

Looks good

If there are no attributes

Then how do we even know that we’re supposed to auto-generate string? When there’s no identifying annotation anywhere?

And finally we should probably put a cap on the length of the values

Agreed, like Iterable.string limits itself to 30 elements (although it doesn’t limit the length of the individual elements).

— Reply to this email directly or view it on GitHub <#4077#issuecomment-45627487>.

CeylonMigrationBot avatar Jun 10 '14 15:06 CeylonMigrationBot

[@gavinking] It should be:

Point { x=0.0; y=0.0; }

of course!

CeylonMigrationBot avatar Jun 10 '14 15:06 CeylonMigrationBot

[@quintesse]

Then how do we even know that we’re supposed to auto-generate string?

Well a bit like @DiegoCoronel shows in his message, I actually think they aren't directly related.

CeylonMigrationBot avatar Jun 10 '14 16:06 CeylonMigrationBot

[@lucaswerkmeister]

It should be:

Point { x=0.0; y=0.0; }

of course!

I like it (and in fact I’m doing this very thing in ceylon.ast – sorry @quintesse :D ), but it’s very misleading if your identifying attributes are not initializer parameters. (It looks like a Ceylon expression for a clone of the instance, but it’s not – the named parameter doesn’t even exist!) That, along with @DiegoCoronel’s links (by the way, Diego, you mentioned someone with the GitHub handle “ToString” :D ), makes me think that perhaps the annotation shouldn’t be on the individual attributes, but on the class instead: If the class is annotated simple or container or something (not basic, conflicts with the existing Basic’s different equals() and hash), then equals(), hash and string are constructed from the class’s initializer parameters.

This is, strictly speaking, less powerful than an attribute-level identifying annotation, but it’s easier to use and seems to make more sense to me. (If you do stuff like shared Instant creationDate = now(), you’ll have to write your own implementations, though.)

(I’m also not sure how this would play together with constructors – #3902.)

CeylonMigrationBot avatar Jun 10 '14 18:06 CeylonMigrationBot

[@quintesse] I don't mind, it's just text, but it does seem to be an instantiation which it is not. And I'd still use only the name if no attributes exist, this whole thing about string started exactly because of the wish to have enumerated types be more useful by default. Defining identifying class Direction() of north, south, east, west {} and then having the items be printed as north {} and south {} is not what we want.

CeylonMigrationBot avatar Jun 10 '14 19:06 CeylonMigrationBot

[@lucaswerkmeister] Okay, but north {} also isn’t the valid Ceylon expression to get north – that would be north, which is exactly what you want. So, in general, the generated string of any object would just return the object’s name.

But what if Direction, north, etc. are inner members? Is their string outer.string + "." + myString?

CeylonMigrationBot avatar Jun 10 '14 19:06 CeylonMigrationBot

[@lucaswerkmeister]

I don't mind, it's just text, but it does seem to be an instantiation which it is not.

Well, for me, it’s supposed to be a valid instantiation (or rather, expression, because of objects), because IMO named arguments are a very nice and very well readable representation of a data structure, and I don’t see why they shouldn’t be used for string. Isn’t it more regular to re-use the syntax Point { x=0.0; y=0.0; } for string instead of a new syntax Point [x=0.0, y=0.0]?

CeylonMigrationBot avatar Jun 10 '14 19:06 CeylonMigrationBot

[@quintesse]

Well, for me, it’s supposed to be a valid instantiation

Because it's very well possible that the instantiation arguments have nothing to do with the attributes you're going to mark as identifying, so even if it looks like a valid instantiation it might not be. But I'm not saying that's a big problem or anything, but it might be confusing.

instead of a new syntax

Syntax? There's no syntax here, it's just a text meant for humans to read.

CeylonMigrationBot avatar Jun 10 '14 19:06 CeylonMigrationBot

[@lucaswerkmeister]

Because it's very well possible that the instantiation arguments have nothing to do with the attributes you're going to mark as identifying, so even if it looks like a valid instantiation it might not be. But I'm not saying that's a big problem or anything, but it might be confusing.

I fully agree with you there :D

I like it […] but it’s very misleading if your identifying attributes are not initializer parameters. […] That […] makes me think that perhaps the annotation shouldn’t be on the individual attributes, but on the class instead: If the class is annotated simple or container or something […], then equals(), hash and string are constructed from the class’s initializer parameters.

(me here, emphasis not in original comment)

CeylonMigrationBot avatar Jun 10 '14 19:06 CeylonMigrationBot

[@lucaswerkmeister]

Syntax? There's no syntax here, it's just a text meant for humans to read.

Okay, format then, or something. It’s a regular format the compiler will be generating, so all the auto strings will look the same.

CeylonMigrationBot avatar Jun 10 '14 20:06 CeylonMigrationBot

[@gavinking] Look, if there are no attributes marked for inclusion in the string, then the format is just the class (or object) name. Easy.

CeylonMigrationBot avatar Jun 10 '14 20:06 CeylonMigrationBot

[@quintesse]

then the format is just the class (or object) name

Which is what I said ;)

perhaps the annotation shouldn’t be on the individual attributes, but on the class instead

We could have both. A class-level one to indicate we want those things to be generated and then optional field-level ones to indicate which ones are to be used for string/equals/hash(/compare). That way if you leave them out by default we could use the initializer attributes.

The thing is that I think people never really thought about serialization in Java, so you have this "transient" thing but people tend to just not think about it much. In the same way I'd maybe prefer people to be explicit about which fields are "identifying" and which are not.

CeylonMigrationBot avatar Jun 10 '14 22:06 CeylonMigrationBot

[@lucaswerkmeister]

We could have both. A class-level one to indicate we want those things to be generated and then optional field-level ones to indicate which ones are to be used for string/equals/hash(/compare). That way if you leave them out by default we could use the initializer attributes.

I like that. There’s still the danger that the string can look like a valid initialization expression even though it’s not (because you marked other fields as identifying), but as it only happens when you the developer explicitly add extra annotations, we can say “you should know what you’re doing” and “not our fault” :) (After all, this is supposed to be information for developers only; you shouldn’t present string to the end user, right?) It might help to allow people to override string without overriding equals() and hash (while we should probably disallow overriding only one of equals() and hash because of consistency).

CeylonMigrationBot avatar Jun 10 '14 22:06 CeylonMigrationBot

[@quintesse]

you shouldn’t present string to the end user, right?

in general, right, more for debugging and logging, although our built-in basic value types like Integer and String (and maybe future user value types) are obvious exceptions. (although even then formatters should probably be preferred)

while we should probably disallow overriding only one of equals() and hash because of consistency

I have always thought this would be a nice compiler warning to have btw, in general, unrelated to this issue. "Warning: overriding equals() without overriding hash()"

CeylonMigrationBot avatar Jun 10 '14 22:06 CeylonMigrationBot

[@lucaswerkmeister] @renatoathaydes suggested that this could be added using macros (#1865), which I think is a really elegant way to do it (no extra complexity in the compiler itself).

CeylonMigrationBot avatar Oct 17 '14 21:10 CeylonMigrationBot

We kinda let this issue die because we were planning on doing it via AST transformers. Since it seems to me that we're no longer very interested in going down that path, and since this feature isn't especially hard to implement we should figure out what we want it to look like.

I can see two obvious ways to identify the "identity" fields of a class:

shared class Point(shared identity Integer x, shared identity Integer y, shared String label) {}

Or:

identity(value x, value y)
shared class Point(shared Integer x, shared Integer y, shared String label) {}

I think I prefer the second option, since:

  • it's much easier to see at a glance which fields comprise the identity of the object, and
  • it allows the order of comparisons to be explicit (which is especially useful for Comparable, but also for equals()).

OTOH, it is strictly more verbose.

gavinking avatar Sep 14 '17 10:09 gavinking

Also, this should be allowed:

shared identity class Point(shared Integer x, shared Integer y, shared Integer label) {}

This would default to using all attributes for calculating equals and hashCode and all shared attributes for string. Comparison order should be based on the field declaration order.

And the compile time code generation should be skipped for methods that have hand-crafted implementations present in the source code.

luolong avatar Sep 14 '17 12:09 luolong

Inclined to the second option, as absolutely clearer and ceylonic.

But I can think of a third option: "identity" being an static attribute (maybe for the Identifiable class), that can be override in classes:

shared class Point(shared Integer x, shared Integer y, shared String label) {
   shared static actual [GettableDeclaration+]? identity = [value x, value y];
}

Pros:

  • Easily overridable
  • No need for new constructs nor changes on the compiler.

Cons:

  • Clutter classes with an undesired attribute.
  • Metamodel reference to local declaration (despite I guess option 2 also steps here)

Not saying it will be the best option, but worth dropping the idea here.

someth2say avatar Sep 14 '17 12:09 someth2say

Now, just looking at this example above, the annotation name identity feels weird... I would love to replace that with value instead...

But value is reserved keyword, so that can't be used

luolong avatar Sep 14 '17 12:09 luolong

I agree with @luolong.

  1. Should be allowed identity annotation without parameters.
  2. identity is not good name, it's confusing in terms equality and identity, cos inverse meaning of the word. May be equality more proper, but not too good yet.

qdzo avatar Sep 14 '17 19:09 qdzo

What about the idea that all annotations should be adjectives?

ePaul avatar Sep 15 '17 20:09 ePaul

I also though about that, but the only thing coming to my mind is identifiedBy . Not pretty.

someth2say avatar Sep 15 '17 20:09 someth2say

How about simply naming the annotation after the interface that it actually satisfies: identifiable?

phlopsi avatar Sep 16 '17 08:09 phlopsi

@phlopsi value equality is declared by the Object class, not by the Identifiable interface. Identifiable allows use of ===.

gavinking avatar Sep 16 '17 20:09 gavinking

How about shared hashed Integer x? Can also be

hashed(value x, value y)
// or hashedBy(value x, value y)
shared class Point(shared Integer x, shared Integer y, shared String label) {}

It implies that the attribute contributes to hashCode and, by a stratch of imagination, must contribute to object's equality comparison.

Another line of thought: shared observed Integer x, meaning that attributes which don't play role in equals/hashCode are not observable from code that relies on equals/hashCode.

A slightly more granular approach :

equalBy(value x, value y)
hashedBy(value x, value y) 
shared class Point(shared Integer x, shared Integer y, shared String label) {}

@gavinking wdyt?

azolotko avatar Sep 21 '18 22:09 azolotko