spark icon indicating copy to clipboard operation
spark copied to clipboard

[SPARK-48697][SQL] Add collation aware string filters

Open stefankandic opened this issue 1 year ago • 2 comments

What changes were proposed in this pull request?

Adding a new classes of filters which are collation aware.

Why are the changes needed?

#46760 Added the logic of predicate widening for collated column references, but this would completely change the filters and if the original expression did not get evaluated by spark later we could end up with wrong results. Also, data sources would never be able to actually support these filters and they would just see them as AlwaysTrue.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New UTs.

Was this patch authored or co-authored using generative AI tooling?

No.

stefankandic avatar Jun 21 '24 16:06 stefankandic

I think there is some misunderstanding here. Filter pushdown has a few steps:

  1. Spark translates catalyst filters to data source filters, which can be a semantically subset as some catalyst filters do not have corresponding data source filters.
  2. Spark pushes down the data source filters to data source implementation.
  3. Data source implementation tells Spark which filters need to be evaluated again at Spark side. See DS v2 SupportsPushDownFilters.pushFilters, which returns to-be-evaluated-by-Spark filters.

I don't get why we need TranslatedFilter, as the problem is not from the translation layer.

cloud-fan avatar Jun 24 '24 15:06 cloud-fan

@cloud-fan I made some changes per our discussion, let me know what you think

stefankandic avatar Jun 27 '24 12:06 stefankandic

thanks, merging to master!

cloud-fan avatar Jul 01 '24 15:07 cloud-fan