spark
spark copied to clipboard
[WIP][SPARK-48000][SQL] Enable hash join support for non-binary collations
What changes were proposed in this pull request?
Enable collation support for hash join.
Why are the changes needed?
Improve JOIN performance for collated strings, arrays of strings, and structs with strings.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
E2e SQL tests in CollationSuite, existing TPCDS collation query test suite.
Was this patch authored or co-authored using generative AI tooling?
Yes.
this is ready for review, @dbatomic @cloud-fan please provide some feedback on this approach also adding the rest of Belgrade SQL team: @mihailom-db @nikolamand-db @stefankandic @stevomitric
Can you add a bit more detailed comment in PR description about actual implementation?