Support OrderBy and Sort in Expr->String
Is your feature request related to a problem or challenge?
The current implementation in #9936 for Sort is probably not correct.
It ignores asc and null_first field.
IMO, the better struct for Sort and order_by is OrderByExpr
https://github.com/sqlparser-rs/sqlparser-rs/blob/deaa6d8151c8f90a7e9cbb624876fe0e8b8a165d/src/ast/query.rs#L1444-L1450
However, OrderByExpr is not an enum item in Expr
Describe the solution you'd like
two steps:
- make
OrderByExpran enum inExpr(change in sql-parser) - make
Sortreturn aOrderByExprduring Expr->String
Describe alternatives you've considered
No response
Additional context
No response
@kevinmingtarja @alamb What do you think? 🤔
I don't think we should change SQL parser
In general, ORDER BY ... isn't really a general Expr (it can't appear in arbitrary locations in SQL, only specific ones)
For example, this isn't valid;
SELECT x ORDER BY y FROM foo;
ORDER BY can appear in certain places like
SELECT x FROM foo ORDER BY y
SELECT agg(x ORDER BY y) FROM foo;
SELECT agg(..) OVER (ORDER BY y) FROM foo
I think we should special case finding Expr::Sort exprs in those locations.
Maybe the code would be clearer if we made the conversion code return an Error if it encountered a SortExpr in an unexpected location 🤔
Yeah I was kinda wondering about this too last time (https://github.com/apache/datafusion/issues/9726#issuecomment-2032753357).
I was thinking we don't change the SQL parser too.
Instead, I think given that the "sources" of the conversion to Expr is not only ast::Expr objects, having its return type be only ast::Expr is perhaps too restrictive? As we've seen in this case. If I understand correctly, I think another case is Expr::AggregateFunction, where we can't really convert it back to ast::Function as it's also not an enum item in ast::Expr.
So what if we extend the return type of expr_to_sql to more than ast::Expr? I think it makes sense to extend it to all the "sources" of conversions from ast::* to Expr.
For example in here, we know that we are converting a ast::OrderByExpr to a Expr::Sort, so I think it makes sense that ast::OrderByExpr should also be a valid return type for expr_to_sql, as that is really the "source" data model of Expr::Sort.
https://github.com/apache/datafusion/blob/f8c623fe045d70a87eac8dc8620b74ff73be56d5/datafusion/sql/src/expr/order_by.rs#L24-L36
Thank you all for your reply!
when turning the expression back into an entire query
Yeah, I was just a little curious why for Expr::Sort we ignore the ORDER BY information. It seems that for the entire query, it could address the Sort. https://github.com/apache/datafusion/blob/f8c623fe045d70a87eac8dc8620b74ff73be56d5/datafusion/sql/src/unparser/plan.rs#L222-L231
Maybe the code would be clearer if we made the conversion code return an Error if it encountered a SortExpr in an unexpected location 🤔
Yeah, so if we encountered a SortExpr in expr_to_sql function, does that mean unexpected location ? I think for a entire query case, the code will probably never reach that.
I know a use case would be just convert a Expr to sql. But col("a").sort(true, true) will be only "a", which doesn't match the sql semantic? 🤔
If I understand correctly, I think another case is Expr::AggregateFunction, where we can't really convert it back to [ast::Function] (https://docs.rs/sqlparser/latest/sqlparser/ast/struct.Function.html#) as it's also not an enum item in ast::Expr.
ast::Function is an enum item in ast::Expr, see
https://github.com/sqlparser-rs/sqlparser-rs/blob/0b5722afbfb60c3e3a354bc7c7dc02cb7803b807/src/ast/mod.rs#L683-L684
So maybe the return type of expr_to_sql being ast::Expr is enough, given the Sort will be addressed in a special way?
Yeah, so if we encountered a SortExpr in expr_to_sql function, does that mean unexpected location ? I think for a entire query case, the code will probably never reach that.
I think so
But
col("a").sort(true, true)will be only "a", which doesn't match the sql semantic? 🤔
Creating SQL from programatically created Exprs is an interesting usecase.
it seems like the issue is that Expr in datafusion can come from both ast::Expr and ast::OrderByExpr
Currently we have https://github.com/apache/datafusion/blob/dd5683745e7d527b01b804c8f4f1a0a53aa225e8/datafusion/sql/src/unparser/expr.rs#L47-L50
Maybe we could change the signature to something like
/// DataFusion's Exprs can represent either an `Expr` or an `OrderByExpr`
enum Unparsed {
// SQL Expression
Expr(ast::Expr),
// SQL ORDER BY expression (e.g. `col ASC NULLS FIRST`)
OrderByExpr(ast::OrderByExpr),
}
pub fn expr_to_sql(expr: &Expr) -> Result<Unparsed> {
...
}
Good, I can try to implement it. :)
Another concern for me is changing the function signature to pub fn expr_to_sql(expr: &Expr) -> Result<Unparsed> would lead every call of expr_to_sql a convert to ast::Expr in the original code, would this be a good choice? 🤔
For example, There would be something like
let l = self.expr_to_sql(left.as_ref())?;
let expr = match l {
Unparsed::Expr(expr) => expr,
Unparsed::OrderByExpr(_) => internal_err(),
};
Maybe we could make the API easier to use like:
/// DataFusion's Exprs can represent either an `Expr` or an `OrderByExpr`
enum Unparsed {
// SQL Expression
Expr(ast::Expr),
// SQL ORDER BY expression (e.g. `col ASC NULLS FIRST`)
OrderByExpr(ast::OrderByExpr),
}
/// Throws an error if Expr can not be represented by an `ast::Expr`
pub fn expr_to_sql(expr: &Expr) -> Result<sql::ast::Expr> {
...
}
/// Returns an `Unparsed` based on the expression
pub fn expr_to_unparsed(expr: &Expr) -> Result<Unparsed> {
...
}
And that way you example would be
let expr = self.expr_to_sql(left.as_ref())?;
In places where order by may be possible then the user could call the self.expr_to_unparsed 🤔
That looks better! Will try to implement it soon.