Incorrect or inconsistent ZonedDateTime handling in operations
Firstly, the over-arching issues:
-
ZonedDateTimeinstances are not equal to otherZonedDateTimeinstances with different time zones, and so when we do convert to alongfor hashing or sorting we are changing the definition of equals that applies. - Our "automatic" conversion code in
ReinterpretUtils.maybeConvertToPrimitiverightlfully requires support for symmetric conversion from theColumnSourceto be reinterpreted (i.e.source instanceof ConvertibleTimeSource.Zoned) and thus a single time zone for the entireColumnSource.
Secondly, the consistency issues:
-
naturalJoinandjoinare never reinterpretingZonedDateTimekey sources tolongfor hashing. Hence they are always applyingZonedDateTime.hashCode()andZonedDateTime.equals(). This is internally consistent within the operations, at least, but inconsistent with ~other operations~aj. -
ajis usingReinterpretUtils.maybeConvertToPrimitive, and so we might reinterpret only one side, resulting in an error that seems inexplicable to users. We will also be getting equality "wrong" (by the Java definition) if the two sides have different time zones. -
aggByconverts symmetrically, which is good. ~However, it is effectively picking different definitions of equality depending on the provenance of theZonedDateTimekey source.~ With currentmaybeConvertToPrimitive, we're always correct. -
sorthas basically the same issue asaggBy. ~While we're always using a comparison that is consistent with equality, which definition of comparison and equality we use depends on the source.~ With currentmaybeConvertToPrimitive, we're always correct.
Solutions:
- Remove
ZonedDateTimesupport frommaybeConvertToPrimitive. This standardizes on Java's definition of comparison and equality, but eliminates opportunities for optimization. - Remove the restriction that we only convert to primitive when reversible. This requires extra work to figure out what our result time zone should be in many cases. If we have a join with mismatched
ConvertibleTimeSource.Zoned.getZone()results, we need to error out or "pick a winner". Worse, if we have to convert back from an un-zonedlongsource, we need to pick a zone.DateTimeUtils.timeZone(), e.g. the system default? - Keep
maybeConvertToPrimitivethe same. For joins, we should only convert if both sides are reinterpretable and have the same fixed zone.
~I think we should pick option (2), as that renders it less fraught to reinterpret between Instant and ZonedDateTime. Otherwise, this reinterpretation changes the meaning of equality, etc, for the data within the column.~
I think we should pick option (3). It means zone matters for equality and comparison. It keeps aggBy and sort correct with current optimizations. We could eventually optimize naturalJoin and join, but they are correct as-is. We would have to fix a bug in aj that might result in error messages or incorrect equality/comparability.
@lbooker42 We need to do two things:
- Fix
ajto ensure that we only reinterpret one side if we can reinterpret both sides. - Maybe reinterpret
ZonedDataTimeinnaturalJoinandjoinIF we can reinterpret both sides. - Probably never reinterpret
ZonedDateTimeinwhereIn, unless we can refactor to know the set table and the source table at the same time.
https://github.com/deephaven/deephaven-core/pull/5780#discussion_r1684961442 We may need to revisit column stats for ZDTs.
For aj, we only need comparability not equality; so reinterpreting should always be fine - as long as we do it consistently.