POC: Try to optimize `columnized_output_exprs`
Which issue does this PR close?
Just a POC for https://github.com/apache/datafusion/issues/13015#issuecomment-2424113317
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?
The latest commit doesn't make the performance better. Just clean up the codebase.
I may be misreading this, but it seems like columnize_output_expr is called in a loop for each expression being projected:
https://github.com/apache/datafusion/blob/ac827abe1b66b1dfa02ce65ae857477f68667843/datafusion/expr/src/logical_plan/builder.rs#L1505-L1511
However, then for each expr it builds up the same hash table: https://github.com/apache/datafusion/blob/ac827abe1b66b1dfa02ce65ae857477f68667843/datafusion/expr/src/utils.rs#L879-L883
Maybe we could refactor the code so the hash table is built once rather than once per expression? Something like
let columizer = Columnizer(plan);
for e in exprs {
columnizer.columnize(e)
}
🤔
coupled with your changes to avoid hash map might make a significant difference in planning
coupled with your changes to avoid hash map might make a significant difference in planning
Thanks, @alamb. I'll check it in a few days. If it makes sense, I will do the benchmark again 👍
However, then for each expr it builds up the same hash table:
https://github.com/apache/datafusion/blob/ac827abe1b66b1dfa02ce65ae857477f68667843/datafusion/expr/src/utils.rs#L879-L883
It doesn't work for this case because the hash tables are different. The case is
SELECT max(a1), max(a2), ... max(a200) from t
I think sharing the hash map may work in some cases like
SELECT max(a1), max(a1) + 1, max(a1) +2, ... from t
We only need to hash max(a1) once, and then look up the column reference many times. In physical_select_aggregates_from_200 case, we still need to hash every expression.
Thanks for double checking @goldmedal
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.