[SPARK-48697][SQL] Add collation aware string filters
What changes were proposed in this pull request?
Adding a new classes of filters which are collation aware.
Why are the changes needed?
#46760 Added the logic of predicate widening for collated column references, but this would completely change the filters and if the original expression did not get evaluated by spark later we could end up with wrong results. Also, data sources would never be able to actually support these filters and they would just see them as AlwaysTrue.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
New UTs.
Was this patch authored or co-authored using generative AI tooling?
No.
I think there is some misunderstanding here. Filter pushdown has a few steps:
- Spark translates catalyst filters to data source filters, which can be a semantically subset as some catalyst filters do not have corresponding data source filters.
- Spark pushes down the data source filters to data source implementation.
- Data source implementation tells Spark which filters need to be evaluated again at Spark side. See DS v2
SupportsPushDownFilters.pushFilters, which returns to-be-evaluated-by-Spark filters.
I don't get why we need TranslatedFilter, as the problem is not from the translation layer.
@cloud-fan I made some changes per our discussion, let me know what you think
thanks, merging to master!