cassandra icon indicating copy to clipboard operation
cassandra copied to clipboard

CASSANDRA-18879 Fix datetime_from_utc_to_local in cqlshlib

Open arkn98 opened this issue 2 years ago • 11 comments

CASSANDRA-18879

UTC offsets for a timezone can change over time. The current datetime_from_utc_to_local doesn't handle daylight savings, and uses the current UTC offset for timezones that may have had a different UTC offset in the past (see comments from [1]).

Also, the UTC datetime objects obtained from the Datastax Python driver are naive (see [2]). Thus, to convert them to the local timezone, we need to make them UTC "aware" objects first (using replace()) before calling astimezone() (see [3]).

[1] https://stackoverflow.com/a/19238551/9523462 [2] https://docs.datastax.com/en/developer/python-driver/3.28/dates_and_times/#read-path [3] https://docs.python.org/3/library/datetime.html#datetime.datetime.astimezone

arkn98 avatar Dec 16 '23 10:12 arkn98

@arkn98 instead of using datetime_from_utc_to_local can't we just change

from:  str(datetime_from_utc_to_local(event.datetime))
to:      str(event.datetime.astimezone())

Python 3.3+ allows omitting the TZ in astimezone() to convert to local. That was changed in https://peps.python.org/pep-0495/#conversion-from-naive-to-aware

bschoening avatar Dec 19 '23 02:12 bschoening

@bschoening Thanks for the review!

PEP-495 did bring about that change. But, to get the expected behavior, astimezone() should be called on "aware" datetime objects. When you call astimezone() on a naive datetime object, it assumes the object to be in the system timezone. The Datastax driver gives out naive UTC datetime objects---trace.started_at and event.datetime directly come from the CQL queries from here and here.

If self is naive, it is presumed to represent time in the system timezone.

Here's an example that illustrates the problem. Assume it is 00:00 in New York, which corresponds to 05:00 in UTC.

>>> utc_naive = datetime.datetime(2023, 11, 25, 5, 0, 0)
>>> print(utc_naive)                   # this is what we get from the driver
2023-11-25 05:00:00
>>> print(utc_naive.astimezone())
2023-11-25 05:00:00-05:00              # which is wrong; it should be 12AM in New York


>>> utc_aware = utc_naive.replace(tzinfo=timezone.utc)
>>> print(utc_aware)
2023-11-25 05:00:00+00:00
>>> print(utc_aware.astimezone())
2023-11-25 00:00:00-05:00             # this is correct

arkn98 avatar Dec 19 '23 15:12 arkn98

@arkn98 thanks for the very helpful explanation and links. It seem the driver is out-of-sync with Python 3 and should be returning aware object in UTC.

Could you add a comment to datetime_from_utc_to_local that this function is necessary because the driver doesn't set the UTC timezone, which is required before converting to local time zone.

bschoening avatar Dec 19 '23 17:12 bschoening

@bschoening I think it is an intentional design choice (at least in the Datastax drivers). They talk about the reasoning here and here.

arkn98 avatar Dec 19 '23 22:12 arkn98

I see, but the python driver docs are out of date -- pytz is no longer relevant with Python 3.9+ and shouldn't be used after EOL for 3.8 in Oct 2024.

Renovating datetime_from_utc_to_local, as your PR does, aligns it with more modern python use of timezones. Thanks for your contribution.

bschoening avatar Dec 19 '23 22:12 bschoening

It's nothing. And, I took so long. Thanks for putting up with me :sweat_smile:

I'll add that comment to the function. Should I add anything else to this PR (like test cases)?

arkn98 avatar Dec 19 '23 23:12 arkn98

Since datetime_from_utc_to_local (now) just chains two stdlib functions, I don't see a benefit to unit tests for it.

bschoening avatar Dec 20 '23 03:12 bschoening

Alright. The PR is ready for review. I'm not sure what to enter for "Test and Documentation Plan" on the Jira ticket.

arkn98 avatar Dec 20 '23 04:12 arkn98

@arkn98 Were you able to test it manually? You can put a comment in the Test and Documentation field and change the status to 'Submit Patch'.

bschoening avatar Dec 20 '23 04:12 bschoening

@bschoening, One quick question. I just saw the display_timezone config. Shouldn't the tracing info also be printed using that value, instead of the local timezone?

arkn98 avatar Dec 20 '23 22:12 arkn98

@arkn98 Perhaps but that would be a different Jira for a bug. It would change existing behavior, and this Jira is just a code refactoring.

bschoening avatar Dec 21 '23 17:12 bschoening

Committed as bfb5c59

arkn98 avatar Mar 08 '24 03:03 arkn98