Introduce Datetime type for ranges outside datetime.[MIN|MAX]YEAR
In Python datetime.datetime type year has to be in range [MINYEAR, MAXYEAR]. This range is not the same as possible timestamps in scylla. Previously if timestamp was outside this range it made driver raise an Exception. It was not correct behavior. There was a work around implemented in cqlsh.
This PR introduces a Datetime type to accommodate ranges outside datetime.[MIN|MAX]YEAR. For Datetimes that cannot be represented as a datetime.datetime (because datetime.MINYEAR, datetime.MAXYEAR), this type falls back to printing milliseconds_from_epoch offset.
Something similar was introduced before for datetime.date type - https://github.com/scylladb/python-driver/commit/4f3c77c970e10774243e7ad7baed2d41c06c4790.
Fixes: #255
I'm not sure about a few things:
- is using milliseconds the right choice?
- should I add support for passing strings as an argument to the Datetime constructor?
@avelanarius @Lorak-mmk I made some extra changes to make CI work, also I removed one test test_row_error_message because this test required driver to raise an Exception when timestamp is outside the range [MIN|MAX]YEAR, but after my change this is not a correct behavior anymore.
Maybe you can add the two examples I had in https://github.com/scylladb/python-driver/issues/255#issuecomment-1963632130 and https://github.com/scylladb/python-driver/issues/255#issuecomment-1963652325 as tests? These two cases used to show broken behavior before you patch, and we'll probably want to know it was fixed by the patch.
@nyh I added the test you asked about
You should also update the docs page about this change (https://github.com/scylladb/python-driver/blob/master/docs/dates-and-times.rst) and possibly other pages if they mention the old way the driver parsed datetimes.
As for some additional testing, I think you should add an additional small test to https://github.com/scylladb/python-driver/blob/27122038b9477eb30ad43f9afdb9f0d79e25182f/tests/integration/standard/test_types.py.
As for some additional testing, I think you should add an additional small test to https://github.com/scylladb/python-driver/blob/27122038b9477eb30ad43f9afdb9f0d79e25182f/tests/integration/standard/test_types.py.
What kind of test do you mean? .util.Datetime is tested every time get_sample is called on timestamp and it happens multiple times in test_types.py for example in test_can_insert_primitive_datatypes
Two things,
this change might break users code, since it's returning a different type than it used to. Duck typing and all, people that have code doing things based on return types, might get broken.
We should also offer this to upstream and hear their input, if it would be better to be aligned with them on such a thing, especially if it's not scylla specific.(last time we deviated, with asyncio as default in python 3.12, I still need to revert it back)
Two things,
this change might break users code, since it's returning a different type than it used to. Duck typing and all, people that have code doing things based on return types, might get broken.
We should also offer this to upstream and hear their input, if it would be better to be aligned with them on such a thing, especially if it's not scylla specific.(last time we deviated, with asyncio as default in python 3.12, I still need to revert it back)
This is absolutely a breaking change, I wonder if we shouldn't bump our major version number because of it. The downside is that syncing with upstream would become even more problematic...
Two things, this change might break users code, since it's returning a different type than it used to. Duck typing and all, people that have code doing things based on return types, might get broken. We should also offer this to upstream and hear their input, if it would be better to be aligned with them on such a thing, especially if it's not scylla specific.(last time we deviated, with asyncio as default in python 3.12, I still need to revert it back)
This is absolutely a breaking change, I wonder if we shouldn't bump our major version number because of it. The downside is that syncing with upstream would become even more problematic...
@Lorak-mmk What do you suggest? Because this issue was discussed in upstream years ago, but they decided to do workaround in cqlsh instead of fixing it https://issues.apache.org/jira/browse/CASSANDRA-10625. @nyh was against this solution and said we should fix it, maybe we should have a discussion about that? I see there are a lot of things to consider. @avelanarius @fruch
@sylwiaszunejko thanks for owning this. I think that @fruch suggestion to suggest it to the upstream and get their feedback is a good start.
When we have that feedback, we can get back to it. IIRC this is an edge cases that we rarly meet in testing and never seen in the field. If that’s the case, there shouldn’t be an urgency to merge it for now.
@sylwiaszunejko thanks for owning this. I think that @fruch suggestion to suggest it to the upstream and get their feedback is a good start.
When we have that feedback, we can get back to it. IIRC this is an edge cases that we rarly meet in testing and never seen in the field.
I know we have seen in two occasions, one Scylla core issue, on internal truncate table one field was in nanosecond and not milisec, and hence overriding.
And once when scylla-bench was randomizing dates with all 64bits, which was fixed.
I don't think we got a single report for a user about this.
If that’s the case, there shouldn’t be an urgency to merge it for now.
@sylwiaszunejko thanks for owning this. I think that @fruch suggestion to suggest it to the upstream and get their feedback is a good start. When we have that feedback, we can get back to it. IIRC this is an edge cases that we rarly meet in testing and never seen in the field.
I know we have seen in two occasions, one Scylla core issue, on internal truncate table one field was in nanosecond and not milisec, and hence overriding.
And once when scylla-bench was randomizing dates with all 64bits, which was fixed.
I don't think we got a single report for a user about this.
This is not surprising, given that the problem is about dates millions of years into the past or future. Maybe not a lot of paleontologists or science-fiction writers use Scylla :-)
If the problem is not encountered by users and is generally not severe then I don't think it's worth it to break compatibility to fix it - and I suspect upstream will say the same.
@sylwiaszunejko thanks for owning this. I think that @fruch suggestion to suggest it to the upstream and get their feedback is a good start. When we have that feedback, we can get back to it. IIRC this is an edge cases that we rarly meet in testing and never seen in the field.
I know we have seen in two occasions, one Scylla core issue, on internal truncate table one field was in nanosecond and not milisec, and hence overriding.
And once when scylla-bench was randomizing dates with all 64bits, which was fixed.
I don't think we got a single report for a user about this.
This is not surprising, given that the problem is about dates millions of years into the past or future. Maybe not a lot of paleontologists or science-fiction writers use Scylla :-)
You are correct it doesn't fully adhere to the CQL spec in that
I think those really need that kind of dates, can override instruct the driver not to convert, or do the same hack done in cqlsh.
Still we can suggest it upstream, and maybe they would be convinced
@avelanarius @fruch @nyh @Lorak-mmk What should be my next steps? How should I contact upstream? Maybe you have any idea how to fix the issue without breaking compatibility?
@avelanarius @fruch @nyh @Lorak-mmk What should be my next steps? How should I contact upstream
Raise an issue, and open a PR similar to this one.
it might take some time until they would react to those, but that how we were doing those so far.
I guess you can find some cassandra mailing to discuss as well
Can this PR be closed now that upstream PR is open?
We shall assume that the upstream PR won't get noticed for quite a while, so the question is what to do with the issue this PR claims to fix: https://github.com/scylladb/python-driver/issues/255
It's ok to provide just an answer there or reference to relevant docs and close the issue if we believe this is what's need to be done.
We are not merging it to not break backwards compatibility, but if someone needs datetimes outside datetime.[MIN|MAX]YEAR feel free to cherry-pick.