python-driver icon indicating copy to clipboard operation
python-driver copied to clipboard

Introduce Datetime type for ranges outside datetime.[MIN|MAX]YEAR

Open sylwiaszunejko opened this issue 1 year ago • 19 comments

In Python datetime.datetime type year has to be in range [MINYEAR, MAXYEAR]. This range is not the same as possible timestamps in scylla. Previously if timestamp was outside this range it made driver raise an Exception. It was not correct behavior. There was a work around implemented in cqlsh.

This PR introduces a Datetime type to accommodate ranges outside datetime.[MIN|MAX]YEAR. For Datetimes that cannot be represented as a datetime.datetime (because datetime.MINYEAR, datetime.MAXYEAR), this type falls back to printing milliseconds_from_epoch offset.

Something similar was introduced before for datetime.date type - https://github.com/scylladb/python-driver/commit/4f3c77c970e10774243e7ad7baed2d41c06c4790.

Fixes: #255

sylwiaszunejko avatar Mar 12 '24 09:03 sylwiaszunejko

I'm not sure about a few things:

  • is using milliseconds the right choice?
  • should I add support for passing strings as an argument to the Datetime constructor?

sylwiaszunejko avatar Mar 12 '24 09:03 sylwiaszunejko

@avelanarius @Lorak-mmk I made some extra changes to make CI work, also I removed one test test_row_error_message because this test required driver to raise an Exception when timestamp is outside the range [MIN|MAX]YEAR, but after my change this is not a correct behavior anymore.

sylwiaszunejko avatar Mar 16 '24 14:03 sylwiaszunejko

Maybe you can add the two examples I had in https://github.com/scylladb/python-driver/issues/255#issuecomment-1963632130 and https://github.com/scylladb/python-driver/issues/255#issuecomment-1963652325 as tests? These two cases used to show broken behavior before you patch, and we'll probably want to know it was fixed by the patch.

nyh avatar Mar 18 '24 11:03 nyh

@nyh I added the test you asked about

sylwiaszunejko avatar Mar 19 '24 11:03 sylwiaszunejko

You should also update the docs page about this change (https://github.com/scylladb/python-driver/blob/master/docs/dates-and-times.rst) and possibly other pages if they mention the old way the driver parsed datetimes.

avelanarius avatar Mar 19 '24 16:03 avelanarius

As for some additional testing, I think you should add an additional small test to https://github.com/scylladb/python-driver/blob/27122038b9477eb30ad43f9afdb9f0d79e25182f/tests/integration/standard/test_types.py.

avelanarius avatar Mar 19 '24 16:03 avelanarius

As for some additional testing, I think you should add an additional small test to https://github.com/scylladb/python-driver/blob/27122038b9477eb30ad43f9afdb9f0d79e25182f/tests/integration/standard/test_types.py.

What kind of test do you mean? .util.Datetime is tested every time get_sample is called on timestamp and it happens multiple times in test_types.py for example in test_can_insert_primitive_datatypes

sylwiaszunejko avatar Mar 20 '24 08:03 sylwiaszunejko

Two things,

this change might break users code, since it's returning a different type than it used to. Duck typing and all, people that have code doing things based on return types, might get broken.

We should also offer this to upstream and hear their input, if it would be better to be aligned with them on such a thing, especially if it's not scylla specific.(last time we deviated, with asyncio as default in python 3.12, I still need to revert it back)

fruch avatar Mar 20 '24 18:03 fruch

Two things,

this change might break users code, since it's returning a different type than it used to. Duck typing and all, people that have code doing things based on return types, might get broken.

We should also offer this to upstream and hear their input, if it would be better to be aligned with them on such a thing, especially if it's not scylla specific.(last time we deviated, with asyncio as default in python 3.12, I still need to revert it back)

This is absolutely a breaking change, I wonder if we shouldn't bump our major version number because of it. The downside is that syncing with upstream would become even more problematic...

Lorak-mmk avatar Mar 20 '24 21:03 Lorak-mmk

Two things, this change might break users code, since it's returning a different type than it used to. Duck typing and all, people that have code doing things based on return types, might get broken. We should also offer this to upstream and hear their input, if it would be better to be aligned with them on such a thing, especially if it's not scylla specific.(last time we deviated, with asyncio as default in python 3.12, I still need to revert it back)

This is absolutely a breaking change, I wonder if we shouldn't bump our major version number because of it. The downside is that syncing with upstream would become even more problematic...

@Lorak-mmk What do you suggest? Because this issue was discussed in upstream years ago, but they decided to do workaround in cqlsh instead of fixing it https://issues.apache.org/jira/browse/CASSANDRA-10625. @nyh was against this solution and said we should fix it, maybe we should have a discussion about that? I see there are a lot of things to consider. @avelanarius @fruch

sylwiaszunejko avatar Mar 21 '24 07:03 sylwiaszunejko

@sylwiaszunejko thanks for owning this. I think that @fruch suggestion to suggest it to the upstream and get their feedback is a good start.

When we have that feedback, we can get back to it. IIRC this is an edge cases that we rarly meet in testing and never seen in the field. If that’s the case, there shouldn’t be an urgency to merge it for now.

roydahan avatar Mar 21 '24 10:03 roydahan

@sylwiaszunejko thanks for owning this. I think that @fruch suggestion to suggest it to the upstream and get their feedback is a good start.

When we have that feedback, we can get back to it. IIRC this is an edge cases that we rarly meet in testing and never seen in the field.

I know we have seen in two occasions, one Scylla core issue, on internal truncate table one field was in nanosecond and not milisec, and hence overriding.

And once when scylla-bench was randomizing dates with all 64bits, which was fixed.

I don't think we got a single report for a user about this.

If that’s the case, there shouldn’t be an urgency to merge it for now.

fruch avatar Mar 21 '24 10:03 fruch

@sylwiaszunejko thanks for owning this. I think that @fruch suggestion to suggest it to the upstream and get their feedback is a good start. When we have that feedback, we can get back to it. IIRC this is an edge cases that we rarly meet in testing and never seen in the field.

I know we have seen in two occasions, one Scylla core issue, on internal truncate table one field was in nanosecond and not milisec, and hence overriding.

And once when scylla-bench was randomizing dates with all 64bits, which was fixed.

I don't think we got a single report for a user about this.

This is not surprising, given that the problem is about dates millions of years into the past or future. Maybe not a lot of paleontologists or science-fiction writers use Scylla :-)

nyh avatar Mar 21 '24 10:03 nyh

If the problem is not encountered by users and is generally not severe then I don't think it's worth it to break compatibility to fix it - and I suspect upstream will say the same.

Lorak-mmk avatar Mar 21 '24 11:03 Lorak-mmk

@sylwiaszunejko thanks for owning this. I think that @fruch suggestion to suggest it to the upstream and get their feedback is a good start. When we have that feedback, we can get back to it. IIRC this is an edge cases that we rarly meet in testing and never seen in the field.

I know we have seen in two occasions, one Scylla core issue, on internal truncate table one field was in nanosecond and not milisec, and hence overriding.

And once when scylla-bench was randomizing dates with all 64bits, which was fixed.

I don't think we got a single report for a user about this.

This is not surprising, given that the problem is about dates millions of years into the past or future. Maybe not a lot of paleontologists or science-fiction writers use Scylla :-)

You are correct it doesn't fully adhere to the CQL spec in that

I think those really need that kind of dates, can override instruct the driver not to convert, or do the same hack done in cqlsh.

Still we can suggest it upstream, and maybe they would be convinced

fruch avatar Mar 21 '24 11:03 fruch

@avelanarius @fruch @nyh @Lorak-mmk What should be my next steps? How should I contact upstream? Maybe you have any idea how to fix the issue without breaking compatibility?

sylwiaszunejko avatar Mar 25 '24 08:03 sylwiaszunejko

@avelanarius @fruch @nyh @Lorak-mmk What should be my next steps? How should I contact upstream

Raise an issue, and open a PR similar to this one.

it might take some time until they would react to those, but that how we were doing those so far.

I guess you can find some cassandra mailing to discuss as well

fruch avatar Mar 25 '24 08:03 fruch

Can this PR be closed now that upstream PR is open?

Lorak-mmk avatar Apr 29 '24 14:04 Lorak-mmk

We shall assume that the upstream PR won't get noticed for quite a while, so the question is what to do with the issue this PR claims to fix: https://github.com/scylladb/python-driver/issues/255

It's ok to provide just an answer there or reference to relevant docs and close the issue if we believe this is what's need to be done.

roydahan avatar Apr 30 '24 14:04 roydahan

We are not merging it to not break backwards compatibility, but if someone needs datetimes outside datetime.[MIN|MAX]YEAR feel free to cherry-pick.

sylwiaszunejko avatar May 09 '24 12:05 sylwiaszunejko