spark icon indicating copy to clipboard operation
spark copied to clipboard

[SPARK-51554][SQL] Add the time_trunc() function for TIME datatype

Open the-sakthi opened this issue 9 months ago • 4 comments

What changes were proposed in this pull request?

  • Added a new built-in function time_trunc(unit, expr) that returns a TIME value truncated to the specified unit.
  • Allowed input for expr to be either a TIME type or a string that can be cast to TIME.
  • Supported truncation units are HOUR, MINUTE, SECOND, MILLISECOND, and MICROSECOND.
  • Handles both foldable and non-foldable inputs

Why are the changes needed?

  • Spark currently lacks a built-in function for truncating TIME values in a similar way as truncTimestamp.

Does this PR introduce any user-facing change?

Yes. A new built-in function time_trunc is added. Users can call the function to truncate TIME values to one of the above mentioned supported units.

How was this patch tested?

By running newly added UTs:

$ build/sbt "test:testOnly *TimeExpressionsSuite.scala"

By manual tests:

# Happy test cases
scala> spark.sql("SELECT time_trunc('HOUR', '09:32:05.123456');").show()
+---------------------------------+
|time_trunc(HOUR, 09:32:05.123456)|
+---------------------------------+
|                         09:00:00|
+---------------------------------+

scala> spark.sql("SELECT time_trunc('MINUTE', TIME'09:32:05.123456');").show()
+------------------------------------------+
|time_trunc(MINUTE, TIME '09:32:05.123456')|
+------------------------------------------+
|                                  09:32:00|
+------------------------------------------+

scala> spark.sql("SELECT time_trunc('second', '09:32:05.123456');").show()
+-----------------------------------+
|time_trunc(second, 09:32:05.123456)|
+-----------------------------------+
|                           09:32:05|
+-----------------------------------+

scala> spark.sql("SELECT time_trunc(concat('milli','second'), '09:32:05.123456');").show()
+--------------------------------------------------+
|time_trunc(concat(milli, second), 09:32:05.123456)|
+--------------------------------------------------+
|                                      09:32:05.123|
+--------------------------------------------------+

scala> spark.sql("SELECT time_trunc('MICROSECOND', '09:32:05.123456');").show()
+----------------------------------------+
|time_trunc(MICROSECOND, 09:32:05.123456)|
+----------------------------------------+
|                         09:32:05.123456|
+----------------------------------------+

scala> spark.sql("SELECT time_trunc('MICROSECOND', '09:32:05.1234');").show()
+--------------------------------------+
|time_trunc(MICROSECOND, 09:32:05.1234)|
+--------------------------------------+
|                         09:32:05.1234|
+--------------------------------------+

# Invalid inputs
scala> spark.sql("SELECT time_trunc('MS', '09:32:05.123456');").show()
+-------------------------------+
|time_trunc(MS, 09:32:05.123456)|
+-------------------------------+
|                           NULL|
+-------------------------------+

scala> spark.sql("SELECT time_trunc('MICROSECOND', '29:32:05.123456');").show()
org.apache.spark.SparkDateTimeException: [CAST_INVALID_INPUT] The value '29:32:05.123456' of the type "STRING" cannot be cast to "TIME(6)" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. SQLSTATE: 22018
== SQL (line 1, position 8) ==
SELECT time_trunc('MICROSECOND', '29:32:05.123456');

# unfoldable inputs
scala> val df = Seq(
     |   ("HOUR",       "09:32:05.123456"),
     |   ("MINUTE",     "10:20:15.123456"),
     |   ("second",     "11:59:59.999999"),
     |   ("MILLISECOND","00:00:00.123000"),
     |   ("MICROSECOND","00:00:00.123000"),
     |   ("MICROSECOND","00:00:00.123456"),
     |   ("MS","00:00:00.123456"),
     |   ("NULL","00:00:00.123456")
     | ).toDF("unitcol", "timecol")
val df: org.apache.spark.sql.DataFrame = [unitcol: string, timecol: string]

scala> val timeDf = df.selectExpr("unitcol", "CAST(timecol AS TIME(6)) as timeval")
val timeDf: org.apache.spark.sql.DataFrame = [unitcol: string, timeval: time(6)]

scala> timeDf.createOrReplaceTempView("tmp")

scala> spark.sql("""
     |   SELECT
     |     unitcol,
     |     timeval,
     |     time_trunc(unitcol, timeval) as truncated
     |   FROM tmp
     | """).show(false)
+-----------+---------------+---------------+
|unitcol    |timeval        |truncated      |
+-----------+---------------+---------------+
|HOUR       |09:32:05.123456|09:00:00       |
|MINUTE     |10:20:15.123456|10:20:00       |
|second     |11:59:59.999999|11:59:59       |
|MILLISECOND|00:00:00.123   |00:00:00.123   |
|MICROSECOND|00:00:00.123   |00:00:00.123   |
|MICROSECOND|00:00:00.123456|00:00:00.123456|
|MS         |00:00:00.123456|NULL           |
|NULL       |00:00:00.123456|NULL           |
+-----------+---------------+---------------+

Was this patch authored or co-authored using generative AI tooling?

No.

the-sakthi avatar Apr 16 '25 11:04 the-sakthi

@MaxGekk While I am trying to convert this into a RuntimeReplaceable version and add the UTs for this, would appreciate any feedbacks from you on this, meanwhile!

the-sakthi avatar Apr 16 '25 11:04 the-sakthi

@MaxGekk Please do let me know your thoughts on this one! Have updated the revision.

the-sakthi avatar Apr 29 '25 10:04 the-sakthi

@MaxGekk Any chance I can get a review here, please?

the-sakthi avatar May 06 '25 13:05 the-sakthi

Hey @MaxGekk , could I get a review here, please? Looking forward to the comments.

the-sakthi avatar Jun 05 '25 23:06 the-sakthi

@MaxGekk can I get a review here please?

the-sakthi avatar Jul 22 '25 18:07 the-sakthi

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

github-actions[bot] avatar Oct 31 '25 00:10 github-actions[bot]