sql icon indicating copy to clipboard operation
sql copied to clipboard

[BUG] `distinct_count_approx` does not work with array types

Open ahkcs opened this issue 4 months ago • 2 comments

Query Information

PPL Query:

source = opensearch_dashboards_sample_data_ecommerce
| stats distinct_count_approx(`manufacturer`) as dc

Expected Result: distinct_count_approx should return an approximate distinct count of the values in the manufacturer field, regardless of whether the field is a single value or an array.

Actual Result: The query fails when the target field is of array type:

{
  "error": {
    "reason": "There was internal problem at backend",
    "details": "java.sql.SQLException: exception while executing query: class java.util.ArrayList cannot be cast to class java.lang.String (java.util.ArrayList and java.lang.String are in module java.base of loader 'bootstrap')",
    "type": "RuntimeException"
  },
  "status": 500
}

Steps to Reproduce:

  1. Create an index with an array-type field (e.g., manufacturer as ["A", "B"]).

  2. Run:

    source = your_index | stats distinct_count_approx(`manufacturer`)
    
  3. Observe that the query fails with a 500 Internal Server Error.

ahkcs avatar Oct 13 '25 21:10 ahkcs

I'm looking on this.

xinyual avatar Oct 20 '25 07:10 xinyual

Previously, it fails since it doesn't pushdown. I already push down approx distinct count in #4614 . But we cannot apply aggregation on array rows when its scheme is not nested. This is decided by the calcite architecture.

xinyual avatar Oct 22 '25 02:10 xinyual