[BUG] GROUP BY does not work properly with array values
What is the bug? If a field that contains array values is used in a GROUP BY clause, each value in the array is considered as a separate row value.
Consider an index with the following data:
{1, [1, 2]}
{2, [3, 4]}
{3, [1, 5]}
{4, [1, 2]}
{5, [2, 3]}
and this query:
SELECT COUNT(x), y FROM test3 GROUP BY y;
{3, 1}
{3, 2}
{2, 3}
{1, 4}
{1, 5}
The expected results are:
{2, [1, 2]}
{1, [3, 4]}
{1, [1, 5]}
{1, [2, 3]}
How can one reproduce the bug? Steps to reproduce the behavior:
- Load the data above into a new index
- Run the query above on the index
What is the expected behavior? Preserve the array value and perform comparisons on the array value as a whole. Only group rows that have the same array values.
What is your host/environment?
- OS: MacOS (should not matter)
- Version [e.g. 3.0 code base]
- Plugins: SQL plugin
Do you have any screenshots? N/A
Do you have any additional context? Issue #1300 had a change recently merged in that allows array values to be used in query evaluation and in the result set.
This appears to be due to how the OpenSearch engine performs grouping on multi-valued fields.
POST test3/_search
{
"from" : 0,
"size" : 10000,
"timeout" : "1m",
"aggregations" : {
"composite_buckets" : {
"composite" : {
"size" : 1000,
"sources" : [ {
"y" : {
"terms" : {
"field" : "y",
"missing_bucket" : true,
"missing_order" : "first",
"order" : "asc"
}
}
} ]
},
"aggregations" : {
"MIN(x)" : {
"min" : {
"field" : "x"
}
}
}
}
}
}
{
"took": 11,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 5,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "test3",
"_id": "ATXWtZIB72QJDYYHLMzM",
"_score": 1,
"_source": {
"x": 1,
"y": [
1,
2
]
}
},
{
"_index": "test3",
"_id": "AjXWtZIB72QJDYYHXszf",
"_score": 1,
"_source": {
"x": 2,
"y": [
3,
4
]
}
},
{
"_index": "test3",
"_id": "AzXWtZIB72QJDYYHqczI",
"_score": 1,
"_source": {
"x": 3,
"y": [
1,
5
]
}
},
{
"_index": "test3",
"_id": "BDXWtZIB72QJDYYHy8wa",
"_score": 1,
"_source": {
"x": 4,
"y": [
1,
2
]
}
},
{
"_index": "test3",
"_id": "Q6i4xJIBxt0sLj_lfLXQ",
"_score": 1,
"_source": {
"x": 5,
"y": [
2,
3
]
}
}
]
},
"aggregations": {
"composite_buckets": {
"after_key": {
"y": 5
},
"buckets": [
{
"key": {
"y": 1
},
"doc_count": 3,
"MIN(x)": {
"value": 1
}
},
{
"key": {
"y": 2
},
"doc_count": 3,
"MIN(x)": {
"value": 1
}
},
{
"key": {
"y": 3
},
"doc_count": 2,
"MIN(x)": {
"value": 2
}
},
{
"key": {
"y": 4
},
"doc_count": 1,
"MIN(x)": {
"value": 2
}
},
{
"key": {
"y": 5
},
"doc_count": 1,
"MIN(x)": {
"value": 3
}
}
]
}
}
}