[BUG] `distinct_count_approx` does not work with array types
Query Information
PPL Query:
source = opensearch_dashboards_sample_data_ecommerce
| stats distinct_count_approx(`manufacturer`) as dc
Expected Result:
distinct_count_approx should return an approximate distinct count of the values in the manufacturer field, regardless of whether the field is a single value or an array.
Actual Result: The query fails when the target field is of array type:
{
"error": {
"reason": "There was internal problem at backend",
"details": "java.sql.SQLException: exception while executing query: class java.util.ArrayList cannot be cast to class java.lang.String (java.util.ArrayList and java.lang.String are in module java.base of loader 'bootstrap')",
"type": "RuntimeException"
},
"status": 500
}
Steps to Reproduce:
-
Create an index with an array-type field (e.g.,
manufactureras["A", "B"]). -
Run:
source = your_index | stats distinct_count_approx(`manufacturer`) -
Observe that the query fails with a
500 Internal Server Error.
I'm looking on this.
Previously, it fails since it doesn't pushdown. I already push down approx distinct count in #4614 . But we cannot apply aggregation on array rows when its scheme is not nested. This is decided by the calcite architecture.