comdb2 icon indicating copy to clipboard operation
comdb2 copied to clipboard

Allowing omitting the "T" separator from a datetime string

Open riverszhang89 opened this issue 6 months ago • 4 comments

Our customers have issue using spark to create a dataframe on a datetime column, because spark interpolates the provided datetime string, but drops the "T" separator when filling the range.

It seems that omitting the "T" character is fairly acceptable, and ISO 8601 does allow it to be omitted in a few cases. This is a very simple patch to allow it.

riverszhang89 avatar Aug 14 '25 22:08 riverszhang89

If we allow strings without T to be converted to datetimes, we have the following conundrum: is "20250101" a datetime or an epoch value?

dorinhogea avatar Aug 15 '25 20:08 dorinhogea

If we allow strings without T to be converted to datetimes, we have the following conundrum: is "20250101" a datetime or an epoch value?

If the string contains only digits, it's treated as a unix timestamp (code). So "20250101" will still be an epoch value! I've also added a test for it!

riverszhang89 avatar Aug 18 '25 14:08 riverszhang89

This breaks UDF and consumer tests?

akshatsikarwar avatar Aug 18 '25 18:08 akshatsikarwar

This breaks UDF and consumer tests?

There's one case that I overlooked, eg 2025-01-01 UTC. Apparently the type code expects that a timezone has a leading space. But because we now additionally allow spaces in between the date portion and the rest of the datetime, we may not have a leading space when we read the timezone. for instance:

2025-01-01 UTC
          ^ old code would stop here, timezone would be "[space]UTC"
   vs
   
2025-01-01 UTC
           ^ new code stops here, timezone is "UTC"

riverszhang89 avatar Aug 20 '25 18:08 riverszhang89