sql icon indicating copy to clipboard operation
sql copied to clipboard

[BUG] PPL aggregation operations (stats, top) return null for nested field values

Open alexey-temnikov opened this issue 4 months ago • 0 comments

Query Information

PPL Command/Query:

source=test-nested-agg | stats COUNT() by request.path
source=test-nested-agg | top request.path

Expected Result: Aggregation operations should return grouped counts by the nested field values, similar to how fields command successfully retrieves the values.

Actual Result:

  • stats COUNT() by nested.field returns a single row with null as the grouping field value
  • top nested.field returns null with total count
  • fields nested.field works correctly (returns all values)
{
  "schema": [
    {"name": "COUNT()", "type": "bigint"},
    {"name": "request.path", "type": "string"}
  ],
  "datarows": [
    [10, null]
  ]
}

Dataset Information

Dataset/Schema Type

  • [x] Open Cybersecurity Schema Framework (OCSF)
  • [x] OpenTelemetry (OTEL)
  • [x] Custom (minimal reproduction below)

Index Mapping

{
  "mappings": {
    "properties": {
      "product": {"type": "keyword"},
      "request": {
        "type": "nested",
        "properties": {
          "path": {"type": "keyword"},
          "method": {"type": "keyword"}
        }
      }
    }
  }
}

Sample Data

{"product":"eks","request":{"path":"/api/v1/pods","method":"GET"}}
{"product":"eks","request":{"path":"/api/v1/services","method":"GET"}}
{"product":"eks","request":{"path":"/api/v1/pods","method":"POST"}}

Bug Description

Issue Summary: PPL aggregation operations (stats, top) return null for nested field values instead of performing proper aggregation. The fields command works correctly on the same nested fields, confirming the data exists.

Steps to Reproduce:

# 1. Create test index with nested field
curl -X PUT "http://localhost:9200/test-nested-agg" -H 'Content-Type: application/json' -d '{
  "mappings": {
    "properties": {
      "product": {"type": "keyword"},
      "request": {
        "type": "nested",
        "properties": {
          "path": {"type": "keyword"},
          "method": {"type": "keyword"}
        }
      }
    }
  }
}'

# 2. Insert test data
curl -X POST "http://localhost:9200/test-nested-agg/_bulk" -H 'Content-Type: application/json' -d '
{"index":{}}
{"product":"eks","request":{"path":"/api/v1/pods","method":"GET"}}
{"index":{}}
{"product":"eks","request":{"path":"/api/v1/services","method":"GET"}}
{"index":{}}
{"product":"eks","request":{"path":"/api/v1/pods","method":"POST"}}
'

curl -X POST "http://localhost:9200/test-nested-agg/_refresh"

# 3. Test fields command (works correctly)
curl -X POST "http://localhost:9200/_plugins/_ppl" -H 'Content-Type: application/json' -d '{
  "query": "source=test-nested-agg | fields request.path"
}'
# Returns: 3 rows with values ✅

# 4. Test aggregation (fails - returns null)
curl -X POST "http://localhost:9200/_plugins/_ppl" -H 'Content-Type: application/json' -d '{
  "query": "source=test-nested-agg | stats COUNT() by request.path"
}'
# Returns: 1 row with null ❌

Environment Information

OpenSearch Version: OpenSearch 3.4.0-SNAPSHOT (also affects 3.3.0)

Root Cause Analysis

Tentative Root Cause:

Note: This is a preliminary analysis and requires further investigation.

OpenSearch requires a nested aggregation wrapper to aggregate on nested fields. PPL/SQL generates standard aggregations without this wrapper.

Verification with OpenSearch DSL:

Without nested wrapper (what PPL generates):

curl -X POST "http://localhost:9200/test-nested-agg/_search?size=0" -H 'Content-Type: application/json' -d '{
  "aggs": {
    "by_path": {
      "terms": {"field": "request.path"}
    }
  }
}'
# Result: Empty buckets ❌

With nested wrapper (correct approach):

curl -X POST "http://localhost:9200/test-nested-agg/_search?size=0" -H 'Content-Type: application/json' -d '{
  "aggs": {
    "nested_request": {
      "nested": {"path": "request"},
      "aggs": {
        "by_path": {
          "terms": {"field": "request.path"}
        }
      }
    }
  }
}'
# Result: Correct aggregation with 2 buckets ✅

Code Location:

The issue is in the aggregation generation layer:

This is a preliminary analysis and requires further investigation.

Related Issues

This bug is part of the broader nested field support gap:

  • #2813 - [FEATURE] Support nested aggregation (parent feature request)
  • #4487 - [BUG] PPL dedup command returns zero results on nested field paths (query layer)
  • #4482 - [BUG] PPL bin command fails to produce output for nested/struct fields
  • #4508 - [BUG] PPL filter on computed fields from nested paths returns zero results
  • #2740 - [BUG] Rename does not work with nested fields
  • #52 - Nested field query improvement

Note: This issue specifically tracks aggregation operations returning null. Related issues #4487 and #4508 affect the query/filter layer with a similar underlying cause but different manifestations.

alexey-temnikov avatar Oct 15 '25 02:10 alexey-temnikov