datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Support OrderBy and Sort in Expr->String

Open yyy1000 opened this issue 1 year ago • 8 comments

Is your feature request related to a problem or challenge?

The current implementation in #9936 for Sort is probably not correct. It ignores asc and null_first field. IMO, the better struct for Sort and order_by is OrderByExpr https://github.com/sqlparser-rs/sqlparser-rs/blob/deaa6d8151c8f90a7e9cbb624876fe0e8b8a165d/src/ast/query.rs#L1444-L1450 However, OrderByExpr is not an enum item in Expr

Describe the solution you'd like

two steps:

  1. make OrderByExpr an enum in Expr (change in sql-parser)
  2. make Sort return a OrderByExpr during Expr->String

Describe alternatives you've considered

No response

Additional context

No response

yyy1000 avatar Apr 26 '24 19:04 yyy1000

@kevinmingtarja @alamb What do you think? 🤔

yyy1000 avatar Apr 26 '24 19:04 yyy1000

I don't think we should change SQL parser

In general, ORDER BY ... isn't really a general Expr (it can't appear in arbitrary locations in SQL, only specific ones)

For example, this isn't valid;

SELECT x ORDER BY y FROM foo;

ORDER BY can appear in certain places like

SELECT x FROM foo ORDER BY y
SELECT agg(x ORDER BY y) FROM foo;
SELECT agg(..) OVER (ORDER BY y) FROM foo

I think we should special case finding Expr::Sort exprs in those locations.

Maybe the code would be clearer if we made the conversion code return an Error if it encountered a SortExpr in an unexpected location 🤔

alamb avatar Apr 27 '24 10:04 alamb

Yeah I was kinda wondering about this too last time (https://github.com/apache/datafusion/issues/9726#issuecomment-2032753357).

I was thinking we don't change the SQL parser too.

Instead, I think given that the "sources" of the conversion to Expr is not only ast::Expr objects, having its return type be only ast::Expr is perhaps too restrictive? As we've seen in this case. If I understand correctly, I think another case is Expr::AggregateFunction, where we can't really convert it back to ast::Function as it's also not an enum item in ast::Expr.

So what if we extend the return type of expr_to_sql to more than ast::Expr? I think it makes sense to extend it to all the "sources" of conversions from ast::* to Expr.

For example in here, we know that we are converting a ast::OrderByExpr to a Expr::Sort, so I think it makes sense that ast::OrderByExpr should also be a valid return type for expr_to_sql, as that is really the "source" data model of Expr::Sort.

https://github.com/apache/datafusion/blob/f8c623fe045d70a87eac8dc8620b74ff73be56d5/datafusion/sql/src/expr/order_by.rs#L24-L36

kevinmingtarja avatar Apr 27 '24 13:04 kevinmingtarja

Thank you all for your reply!

when turning the expression back into an entire query

Yeah, I was just a little curious why for Expr::Sort we ignore the ORDER BY information. It seems that for the entire query, it could address the Sort. https://github.com/apache/datafusion/blob/f8c623fe045d70a87eac8dc8620b74ff73be56d5/datafusion/sql/src/unparser/plan.rs#L222-L231

Maybe the code would be clearer if we made the conversion code return an Error if it encountered a SortExpr in an unexpected location 🤔

Yeah, so if we encountered a SortExpr in expr_to_sql function, does that mean unexpected location ? I think for a entire query case, the code will probably never reach that. I know a use case would be just convert a Expr to sql. But col("a").sort(true, true) will be only "a", which doesn't match the sql semantic? 🤔

If I understand correctly, I think another case is Expr::AggregateFunction, where we can't really convert it back to [ast::Function] (https://docs.rs/sqlparser/latest/sqlparser/ast/struct.Function.html#) as it's also not an enum item in ast::Expr.

ast::Function is an enum item in ast::Expr, see https://github.com/sqlparser-rs/sqlparser-rs/blob/0b5722afbfb60c3e3a354bc7c7dc02cb7803b807/src/ast/mod.rs#L683-L684 So maybe the return type of expr_to_sql being ast::Expr is enough, given the Sort will be addressed in a special way?

yyy1000 avatar Apr 27 '24 13:04 yyy1000

Yeah, so if we encountered a SortExpr in expr_to_sql function, does that mean unexpected location ? I think for a entire query case, the code will probably never reach that.

I think so

But col("a").sort(true, true) will be only "a", which doesn't match the sql semantic? 🤔

Creating SQL from programatically created Exprs is an interesting usecase.

it seems like the issue is that Expr in datafusion can come from both ast::Expr and ast::OrderByExpr

Currently we have https://github.com/apache/datafusion/blob/dd5683745e7d527b01b804c8f4f1a0a53aa225e8/datafusion/sql/src/unparser/expr.rs#L47-L50

Maybe we could change the signature to something like

/// DataFusion's Exprs can represent either an `Expr` or an `OrderByExpr`
enum Unparsed {
  // SQL Expression
  Expr(ast::Expr),
  // SQL ORDER BY expression (e.g. `col ASC NULLS FIRST`)
  OrderByExpr(ast::OrderByExpr),
}
 
pub fn expr_to_sql(expr: &Expr) -> Result<Unparsed> { 
...
 } 

alamb avatar Apr 30 '24 13:04 alamb

Good, I can try to implement it. :) Another concern for me is changing the function signature to pub fn expr_to_sql(expr: &Expr) -> Result<Unparsed> would lead every call of expr_to_sql a convert to ast::Expr in the original code, would this be a good choice? 🤔

For example, There would be something like

let l = self.expr_to_sql(left.as_ref())?;
let expr = match l {
    Unparsed::Expr(expr) => expr,
    Unparsed::OrderByExpr(_) => internal_err(),
};

yyy1000 avatar May 03 '24 05:05 yyy1000

Maybe we could make the API easier to use like:

/// DataFusion's Exprs can represent either an `Expr` or an `OrderByExpr`
enum Unparsed {
  // SQL Expression
  Expr(ast::Expr),
  // SQL ORDER BY expression (e.g. `col ASC NULLS FIRST`)
  OrderByExpr(ast::OrderByExpr),
}
 
/// Throws an error if Expr can not be represented by an `ast::Expr` 
pub fn expr_to_sql(expr: &Expr) -> Result<sql::ast::Expr> { 
...
 } 

/// Returns an `Unparsed` based on the expression
pub fn expr_to_unparsed(expr: &Expr) -> Result<Unparsed> { 
...
 } 

And that way you example would be

let expr = self.expr_to_sql(left.as_ref())?;

In places where order by may be possible then the user could call the self.expr_to_unparsed 🤔

alamb avatar May 03 '24 12:05 alamb

That looks better! Will try to implement it soon.

yyy1000 avatar May 03 '24 15:05 yyy1000