Equality comparison is incorrect for `CategoricalMatrix`
Comparing one CategoricalMatrix against another always yields False even if their contents are identical:
import tabmat
mat1 = tabmat.CategoricalMatrix([1,2,3])
mat2 = tabmat.CategoricalMatrix([1,2,3])
mat1 == mat2
False
I would not say that it is 'incorrect'. The equality magic method has not been implemented in tabmat. Therefore, CategoricalMatrix, which does not inherit from a class, uses the default, which verifies the identity of the object. In your above example, mat1 == mat1 is true because it has the same identity.
I am not sure what is the use case for checking equality. Could you please tell us why you want to check this?
If you need a fast solution, I would suggest you test for equality using the underlying attribute. In the case of CategoricalMatrix, the data is stored in the .cat attribute. Thus, this will work:
import tabmat
mat1 = tabmat.CategoricalMatrix([1,2,3])
mat2 = tabmat.CategoricalMatrix([1,2,3])
mat1.cat == mat2.cat
Similar to DenseMatrix and SparseMatrix, it will return a boolean array that has the same length as the input array of the CategoricalMatrix.
Thanks for the answer. I think it'd be nice to have the equality comparison return a 1-d boolean array (as opposed to a single value or to a densified 2-d array), as if one had accessed .cat.
This is useful for example when writing unit tests where this package is used, or when creating new features from these matrix objects.
I'm not totally convinced that equality comparison should return a 1-d array, even if it gets implemented. For example, the shape of your example matrices is (3, 3), and the 1-d array is just an internal representation of a numeric matrix containing 0s and 1s. It would be more intuitive to me if element-wise operations returned arrays with the same shape as the inputs. Another way to look at it is that IMO mat1 == mat2 should have the same behavior as mat_1.A == mat_2.A.
For downstream purposes, would subclassing CategoricalMatrix and overriding its __eq__ method to return self.cat == other.cat be a solution to your use case?
I'm not totally convinced that equality comparison should return a 1-d array, even if it gets implemented. For example, the shape of your example matrices is
(3, 3), and the 1-d array is just an internal representation of a numeric matrix containing 0s and 1s. It would be more intuitive to me if element-wise operations returned arrays with the same shape as the inputs. Another way to look at it is that IMOmat1 == mat2should have the same behavior asmat_1.A == mat_2.A.For downstream purposes, would subclassing
CategoricalMatrixand overriding its__eq__method to returnself.cat == other.catbe a solution to your use case?
One could also argue that the raw shape is (3,1) whereas an OHE representation would be (3,3).
And yes, thanks - for my purposes even just calling .cat would be enough, but would be nicer to have such functionality implemented at the class level, the moreso since other classes like DenseMatrix do have equality comparisons implemented (guess they might be inherited from NumPy but they work as intended).