[SPARK-51554][SQL] Add the time_trunc() function for TIME datatype
What changes were proposed in this pull request?
- Added a new built-in function
time_trunc(unit, expr)that returns aTIMEvalue truncated to the specified unit. - Allowed input for expr to be either a
TIMEtype or a string that can be cast to TIME. - Supported truncation units are
HOUR,MINUTE,SECOND,MILLISECOND, andMICROSECOND. - Handles both foldable and non-foldable inputs
Why are the changes needed?
- Spark currently lacks a built-in function for truncating TIME values in a similar way as
truncTimestamp.
Does this PR introduce any user-facing change?
Yes. A new built-in function time_trunc is added. Users can call the function to truncate TIME values to one of the above mentioned supported units.
How was this patch tested?
By running newly added UTs:
$ build/sbt "test:testOnly *TimeExpressionsSuite.scala"
By manual tests:
# Happy test cases
scala> spark.sql("SELECT time_trunc('HOUR', '09:32:05.123456');").show()
+---------------------------------+
|time_trunc(HOUR, 09:32:05.123456)|
+---------------------------------+
| 09:00:00|
+---------------------------------+
scala> spark.sql("SELECT time_trunc('MINUTE', TIME'09:32:05.123456');").show()
+------------------------------------------+
|time_trunc(MINUTE, TIME '09:32:05.123456')|
+------------------------------------------+
| 09:32:00|
+------------------------------------------+
scala> spark.sql("SELECT time_trunc('second', '09:32:05.123456');").show()
+-----------------------------------+
|time_trunc(second, 09:32:05.123456)|
+-----------------------------------+
| 09:32:05|
+-----------------------------------+
scala> spark.sql("SELECT time_trunc(concat('milli','second'), '09:32:05.123456');").show()
+--------------------------------------------------+
|time_trunc(concat(milli, second), 09:32:05.123456)|
+--------------------------------------------------+
| 09:32:05.123|
+--------------------------------------------------+
scala> spark.sql("SELECT time_trunc('MICROSECOND', '09:32:05.123456');").show()
+----------------------------------------+
|time_trunc(MICROSECOND, 09:32:05.123456)|
+----------------------------------------+
| 09:32:05.123456|
+----------------------------------------+
scala> spark.sql("SELECT time_trunc('MICROSECOND', '09:32:05.1234');").show()
+--------------------------------------+
|time_trunc(MICROSECOND, 09:32:05.1234)|
+--------------------------------------+
| 09:32:05.1234|
+--------------------------------------+
# Invalid inputs
scala> spark.sql("SELECT time_trunc('MS', '09:32:05.123456');").show()
+-------------------------------+
|time_trunc(MS, 09:32:05.123456)|
+-------------------------------+
| NULL|
+-------------------------------+
scala> spark.sql("SELECT time_trunc('MICROSECOND', '29:32:05.123456');").show()
org.apache.spark.SparkDateTimeException: [CAST_INVALID_INPUT] The value '29:32:05.123456' of the type "STRING" cannot be cast to "TIME(6)" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. SQLSTATE: 22018
== SQL (line 1, position 8) ==
SELECT time_trunc('MICROSECOND', '29:32:05.123456');
# unfoldable inputs
scala> val df = Seq(
| ("HOUR", "09:32:05.123456"),
| ("MINUTE", "10:20:15.123456"),
| ("second", "11:59:59.999999"),
| ("MILLISECOND","00:00:00.123000"),
| ("MICROSECOND","00:00:00.123000"),
| ("MICROSECOND","00:00:00.123456"),
| ("MS","00:00:00.123456"),
| ("NULL","00:00:00.123456")
| ).toDF("unitcol", "timecol")
val df: org.apache.spark.sql.DataFrame = [unitcol: string, timecol: string]
scala> val timeDf = df.selectExpr("unitcol", "CAST(timecol AS TIME(6)) as timeval")
val timeDf: org.apache.spark.sql.DataFrame = [unitcol: string, timeval: time(6)]
scala> timeDf.createOrReplaceTempView("tmp")
scala> spark.sql("""
| SELECT
| unitcol,
| timeval,
| time_trunc(unitcol, timeval) as truncated
| FROM tmp
| """).show(false)
+-----------+---------------+---------------+
|unitcol |timeval |truncated |
+-----------+---------------+---------------+
|HOUR |09:32:05.123456|09:00:00 |
|MINUTE |10:20:15.123456|10:20:00 |
|second |11:59:59.999999|11:59:59 |
|MILLISECOND|00:00:00.123 |00:00:00.123 |
|MICROSECOND|00:00:00.123 |00:00:00.123 |
|MICROSECOND|00:00:00.123456|00:00:00.123456|
|MS |00:00:00.123456|NULL |
|NULL |00:00:00.123456|NULL |
+-----------+---------------+---------------+
Was this patch authored or co-authored using generative AI tooling?
No.
@MaxGekk While I am trying to convert this into a RuntimeReplaceable version and add the UTs for this, would appreciate any feedbacks from you on this, meanwhile!
@MaxGekk Please do let me know your thoughts on this one! Have updated the revision.
@MaxGekk Any chance I can get a review here, please?
Hey @MaxGekk , could I get a review here, please? Looking forward to the comments.
@MaxGekk can I get a review here please?
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!