datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Introduce user defined SQL planner API

Open jayzhan211 opened this issue 1 year ago • 1 comments

Which issue does this PR close?

Closes #11168 part of https://github.com/apache/datafusion/issues/10534

TLDR is to make it easier for users to customize sql planning, specifically to handle operator translation

Rationale for this change

Let sql planner customizable Follow up from #11168

What changes are included in this PR?

  1. Array function rewrite
  2. GetField
  3. Array literal

TODO in next PRs

#10534, to close this, we have more functions to move.

  • date_part
  • create_struct
  • create_named_struct
  • sql_overlay_to_expr
  • sql_position_to_expr
  • sql_substring_to_expr
  • sql_compound_identifier_to_expr

Are these changes tested?

Are there any user-facing changes?

jayzhan211 avatar Jun 30 '24 02:06 jayzhan211

Interesting, I got stack overflow in CI but not in local.

jayzhan211 avatar Jun 30 '24 08:06 jayzhan211

In general this looks great. I'm going to try to use it on datafusion-functions-json and let you know how I get on.

Should be able to get this done today, or tomorrow if life get's in the way.

samuelcolvin avatar Jul 01 '24 11:07 samuelcolvin

Thanks @alamb

jayzhan211 avatar Jul 02 '24 00:07 jayzhan211

I'm a bit surprised this got merged without waiting for me to try it on datafusion-functions-json!

I said just above that I would try to do that yesterday or today.

As far as I can see, this PR provides no way to register external custom query planners. I guess I'll work on a follow up...

samuelcolvin avatar Jul 02 '24 10:07 samuelcolvin

Thanks @samuelcolvin I agree iterating with some other PRs is a good idea.

alamb avatar Jul 02 '24 10:07 alamb

I wrote up https://github.com/apache/datafusion/issues/11207 to track moving the remaining SQL planning features to this API

alamb avatar Jul 02 '24 10:07 alamb

Sorry, I didn't follow up this topic before. I'm thinking if we can have one UserDefinedSQLPlanner trait which all methods have default official implementation, users just need to impl method they want to customize. We have to loop UserDefinedSQLPlanner instances in current design, it seems not very efficient and has much boilerplate code. And these UserDefinedSQLPlanner instaces might have conflicts.

lewiszlw avatar Jul 08 '24 08:07 lewiszlw

We have to loop UserDefinedSQLPlanner instances in current design, it seems not very efficient and has much boilerplate code. And these UserDefinedSQLPlanner instaces might have conflicts.

I agree the looping is not ideal.

In terms of conflicts, I think the semantics are pretty well defined now (the planners are ordered and whichever successfully plans the option first will be chosen).

alamb avatar Jul 08 '24 10:07 alamb