ext-ds icon indicating copy to clipboard operation
ext-ds copied to clipboard

Implement the Hashable interface on Container classes ?

Open castarco opened this issue 9 years ago • 14 comments

Hi,

is there a reason to not implement the Hashable interface on the "container" classes? I think that it will be very useful in order to compare containers.

It seems that the current implementation forces users to loop over collections to compare them. I know that if this comparison is implemented as a method, it will work the same way, but I think is better to implement the comparison one single time rather than reimplement it again and again in "userland".

castarco avatar Oct 16 '16 16:10 castarco

I like this, but I'm not sure how I would handle the hash value. I would prefer to implement an equals method on Collection, but not implement Hashable.

rtheunissen avatar Oct 24 '16 20:10 rtheunissen

Hi @rtheunissen

any plan about this equals feature ? Can help ?

Just see that new \Ds\Set([1, 3]) == new \Ds\Set([1, 2, 3]) returns true and it makes my unit tests very sad.

tyx avatar Nov 15 '16 13:11 tyx

Ok, this is not a BC break feature... any chance to get it backported to 1.x or does it need a big internal refactoring?

shouze avatar Nov 28 '16 08:11 shouze

Doesn't need internal refactoring, I'm really just lumping everything into 2.0 because it's my active branch. Happy to implement in 1.x now and just merge into 2.0.

rtheunissen avatar Nov 28 '16 09:11 rtheunissen

See #67 for status on equals.

If we did implement Hashable on collections, they would simply use spl_object_hash and $this === $that, which is what the extension already falls back on.

We either have to use and support equals everywhere, or not at all.

An example of a consistency I want to avoid at all costs:

$a = new Ds\Set([1, 2, 3]);
$b = new Ds\Set(["1", "2", "3"]);

$a == $b;       // true, because of loose property comparison.
$a === $b;      // false, because they are not the same instance.
$a->equals($b); // false, because the sets don't contain the same values.

The inconsistency is between the == and the equals, which is something we can fix in the extension, but not the polyfill. This is the primary blocker. There is no way to implement "not the same instance but strict comparison of properties" in userland PHP as at 7.1.

Like https://github.com/atoum/atoum/issues/671 mentions, even if we implement equals on Collection,== won't behave as expected and won't resolve the problem outlined in this issue.

rtheunissen avatar Feb 12 '17 01:02 rtheunissen

As mentioned on another ticket it might be good to extract the equals method into an Equatable interface which Hashable extends from. I'm not sure collections should be Hashable but they ought to be able to check for equality.

The issues with ==, ===, and equals are not that important in my opinion. Users cannot use == and === on userland objects reliably already, so providing equals and documenting it is the best way forward, in my opinion.

morrisonlevi avatar Nov 29 '17 16:11 morrisonlevi

Usually an Equatable, Equalable, Identifiable, … interface goes hand in hand with the hash method because they actually just compare the hashes of the objects.

interface Identifiable {
    function equals($other): bool;
    function hash(): string;
}

trait IdentifiableTrait {
    public function equals($other): bool {
        return $other instanceof $this && $other->hash() === $this->hash();
    }

    public function hash(): string {
        return \spl_object_hash();
    }
}

final class SomeEntity implements Identifiable {
    use IdentifiableTrait;

    private $id;

    public function hash(): string {
        return \md5(__CLASS__ . $this->id, \false);
    }
}

final class SomeValueObject implements Identifiable {
    use IdentifiableTrait;

    private $some_prop;

    public function hash(): string {
        return \md5(\var_export($this, \true), \false);
    }
}

This might be a way how you could implement it.

Fleshgrinder avatar Nov 29 '17 16:11 Fleshgrinder

Usually an Equatable, Equalable, Identifiable, … interface goes hand in hand with the hash method because they actually just compare the hashes of the objects.

I'm not sure where you get that idea but that is fundamentally broken. The whole reason equals is required by Hashable is to distinguish between two objects whose hashes collide but they aren't the same. The hash gets you into the correct bucket or region and then you iterate until you find one that equals matches.

Edit to expand on this a bit: If $lhs->equals($rhs) is true then $lhs->hash() == $rhs->hash() must also be true. However the latter does not imply the former, meaning that just because two hashes are considered equal does not mean the objects are. This is because fast hash values (which we want) are typically lossy. Their goal is to get us into the correct bucket or region and then we can use the more expensive but completely accurate equals to narrow in on the exact location.

morrisonlevi avatar Nov 29 '17 17:11 morrisonlevi

The purpose of the hash function as it is used here in the extension is what you describe. However, that is different in programming languages that have equality built-in, e.g.:

Also, as explained in #67, lhs.equals(rhs) does not imply lhs.hash == rhs.hash for all types, only for most.

Fleshgrinder avatar Nov 29 '17 17:11 Fleshgrinder

Also, as explained in #67, lhs.equals(rhs) does not imply lhs.hash == rhs.hash for all types, only for most.

Can you directly link me to the explanation? I see some type-juggling issues but not where two objects which are of the same type, are Hashable, and are equal by their equals call do not have the same hash code.

morrisonlevi avatar Nov 29 '17 18:11 morrisonlevi

@morrisonlevi there you go: https://github.com/php-ds/extension/issues/67#issuecomment-347663830

Fleshgrinder avatar Nov 29 '17 19:11 Fleshgrinder

Floating point values are not objects, do not implement Hashable, and thus do not have an equals method. While INF and NAN semantics are something to be aware of it's not the same case as I'm talking about here. In every case where you have a Hashable object you must use the semantics outlined or your type is not well-behaved and will not work properly in the collection.

morrisonlevi avatar Nov 29 '17 20:11 morrisonlevi

Not in PHP, that is true. What about two DateTime instances that have the same time but are in different time zones? Are they equal? Probably. Do they have the same hash? Probably not.

I’m only trying to make clear that partial equality and ordering is something that is true for some types. The constraint you are laying out is normal for the keys of hash map implementations (if x == y then x.hash == y.hash) because they are broken if that does not hold.

Fleshgrinder avatar Nov 29 '17 21:11 Fleshgrinder

I think the decision here is that only Hashable types should be supported by hash-based structures, and the contract assumes that if x == y then x.hash == y.hash. Containers that are mutable will not implement Hashable. You would need to convert to an immutable type first.

Hashable and Equatable should be separated.

rtheunissen avatar Oct 17 '18 17:10 rtheunissen