sql icon indicating copy to clipboard operation
sql copied to clipboard

[BUG] GROUP BY does not work properly with array values

Open normanj-bitquill opened this issue 1 year ago • 1 comments

What is the bug? If a field that contains array values is used in a GROUP BY clause, each value in the array is considered as a separate row value.

Consider an index with the following data:

{1, [1, 2]}
{2, [3, 4]}
{3, [1, 5]}
{4, [1, 2]}
{5, [2, 3]}

and this query:

SELECT COUNT(x), y FROM test3 GROUP BY y;
{3, 1}
{3, 2}
{2, 3}
{1, 4}
{1, 5}

The expected results are:

{2, [1, 2]}
{1, [3, 4]}
{1, [1, 5]}
{1, [2, 3]}

How can one reproduce the bug? Steps to reproduce the behavior:

  1. Load the data above into a new index
  2. Run the query above on the index

What is the expected behavior? Preserve the array value and perform comparisons on the array value as a whole. Only group rows that have the same array values.

What is your host/environment?

  • OS: MacOS (should not matter)
  • Version [e.g. 3.0 code base]
  • Plugins: SQL plugin

Do you have any screenshots? N/A

Do you have any additional context? Issue #1300 had a change recently merged in that allows array values to be used in query evaluation and in the result set.

normanj-bitquill avatar Oct 28 '24 22:10 normanj-bitquill

This appears to be due to how the OpenSearch engine performs grouping on multi-valued fields.

POST test3/_search
{
  "from" : 0,
  "size" : 10000,
  "timeout" : "1m",
  "aggregations" : {
    "composite_buckets" : {
      "composite" : {
        "size" : 1000,
        "sources" : [ {
          "y" : {
            "terms" : {
              "field" : "y",
              "missing_bucket" : true,
              "missing_order" : "first",
              "order" : "asc"
            }
          }
        } ]
      },
      "aggregations" : {
        "MIN(x)" : {
          "min" : {
            "field" : "x"
          }
        }
      }
    }
  }
}
{
  "took": 11,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 5,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "test3",
        "_id": "ATXWtZIB72QJDYYHLMzM",
        "_score": 1,
        "_source": {
          "x": 1,
          "y": [
            1,
            2
          ]
        }
      },
      {
        "_index": "test3",
        "_id": "AjXWtZIB72QJDYYHXszf",
        "_score": 1,
        "_source": {
          "x": 2,
          "y": [
            3,
            4
          ]
        }
      },
      {
        "_index": "test3",
        "_id": "AzXWtZIB72QJDYYHqczI",
        "_score": 1,
        "_source": {
          "x": 3,
          "y": [
            1,
            5
          ]
        }
      },
      {
        "_index": "test3",
        "_id": "BDXWtZIB72QJDYYHy8wa",
        "_score": 1,
        "_source": {
          "x": 4,
          "y": [
            1,
            2
          ]
        }
      },
      {
        "_index": "test3",
        "_id": "Q6i4xJIBxt0sLj_lfLXQ",
        "_score": 1,
        "_source": {
          "x": 5,
          "y": [
            2,
            3
          ]
        }
      }
    ]
  },
  "aggregations": {
    "composite_buckets": {
      "after_key": {
        "y": 5
      },
      "buckets": [
        {
          "key": {
            "y": 1
          },
          "doc_count": 3,
          "MIN(x)": {
            "value": 1
          }
        },
        {
          "key": {
            "y": 2
          },
          "doc_count": 3,
          "MIN(x)": {
            "value": 1
          }
        },
        {
          "key": {
            "y": 3
          },
          "doc_count": 2,
          "MIN(x)": {
            "value": 2
          }
        },
        {
          "key": {
            "y": 4
          },
          "doc_count": 1,
          "MIN(x)": {
            "value": 2
          }
        },
        {
          "key": {
            "y": 5
          },
          "doc_count": 1,
          "MIN(x)": {
            "value": 3
          }
        }
      ]
    }
  }
}

normanj-bitquill avatar Oct 30 '24 17:10 normanj-bitquill

[Catch All Triage - 1, 2, 3, 4, 5]

andrross avatar Nov 18 '24 17:11 andrross