spark icon indicating copy to clipboard operation
spark copied to clipboard

[WIP][SPARK-48000][SQL] Enable hash join support for non-binary collations

Open uros-db opened this issue 1 year ago • 2 comments

What changes were proposed in this pull request?

Enable collation support for hash join.

Why are the changes needed?

Improve JOIN performance for collated strings, arrays of strings, and structs with strings.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

E2e SQL tests in CollationSuite, existing TPCDS collation query test suite.

Was this patch authored or co-authored using generative AI tooling?

Yes.

uros-db avatar Apr 22 '24 13:04 uros-db

this is ready for review, @dbatomic @cloud-fan please provide some feedback on this approach also adding the rest of Belgrade SQL team: @mihailom-db @nikolamand-db @stefankandic @stevomitric

uros-db avatar May 07 '24 11:05 uros-db

Can you add a bit more detailed comment in PR description about actual implementation?

dbatomic avatar May 07 '24 13:05 dbatomic