sql icon indicating copy to clipboard operation
sql copied to clipboard

Push down filters on nested fields as nested queries

Open yuancu opened this issue 2 months ago • 2 comments

Description

Because nested fields are indexed as hidden documents, we cannot query them directly. Instead, we have to use the nested query to access them.

For example, for the following mapping:

{
  "mappings": {
    "properties": {
      "id": {"type": "keyword"},
      "items": {
        "type": "nested",
        "properties": {
          "name": {"type": "keyword"}
        }
      }
    }
  }
}

if we want to use term query to match those items with name banana, instead of using:

{
  "query": {
    "term": {
      "items.name": {
        "value": "banana",
        "boost": 1
      }
    }
  }
}

We should use

{
  "query": {
    "nested": {
      "path": "items",
      "query": {
        "term": {
          "items.name": {
            "value": "banana",
            "boost": 1
          }
        }
      }
    }
  }
}

Here, the nested clause steps down into the nested items field. It no longer has access to fields in the root document, nor fields in any other nested document.

Work items

  • [x] Fix the case in the issue #4508: script's access to a single nested field.
  • [x] Support querying deep nested objects: if a field of a nested object is another nested object, we should construct a nested query using the deepest level
  • [x] Support nested query in other types of filter.
    • [x] Term query
    • [x] Range query
    • [x] Multi-terms query
  • [x] ~~Support filters on both nested and root objects, where they can be separated. E.g. | where items.name = 'banana' and id = 2 can be separated to two filters: items.name = 'banana' stays under a nested query, while id = 2 stays on a normal term query, combined with a bool query.~~ Already implemented.
  • [x] Support filters on nested and root objects, where they can not be separated

Related Issues

Resolves #4508

Check List

  • [x] New functionality includes testing.
  • [ ] New functionality has been documented.
  • [ ] New functionality has javadoc added.
  • [ ] New functionality has a user manual doc added.
  • [ ] New PPL command checklist all confirmed.
  • [ ] API changes companion pull request created.
  • [x] Commits are signed per the DCO using --signoff or -s.
  • [ ] Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Summary by CodeRabbit

  • New Features

    • Nested field filtering now supports computed fields and multiple filter conditions
    • Cascaded nested field hierarchies are fully supported in queries
    • Enhanced query optimization for filters and aggregations on nested fields
  • Improvements

    • Better handling of complex nested structures in query execution and pushdown optimization

✏️ Tip: You can customize this high-level summary in your review settings.

yuancu avatar Nov 19 '25 07:11 yuancu

Walkthrough

The changes implement comprehensive support for filtering on nested fields in OpenSearch SQL queries. New test cases validate filtering scenarios including computed nested fields, combined nested and root field filters, and cascaded nested hierarchies. Core predicate analysis logic is enhanced to detect nested field paths and wrap filters in proper OpenSearch nested query DSL, while test indices and expected output fixtures are added to support these validation scenarios.

Changes

Cohort / File(s) Summary
Test Implementation
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java, integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteWhereCommandIT.java
Add five new test methods for nested field filtering: testFilterOnComputedNestedFields, testFilterOnNestedAndRootFields, testFilterOnNestedFields, testFilterOnMultipleCascadedNestedFields, testAggFilterOnNestedFields. CalciteWhereCommandIT also adds testScriptFilterOnDifferentNestedHierarchyShouldThrow. Both update init() to load NESTED_SIMPLE, DEEP_NESTED, and CASCADED_NESTED indices.
Test Infrastructure & Constants
integ-test/src/test/java/org/opensearch/sql/legacy/SQLIntegTestCase.java, integ-test/src/test/java/org/opensearch/sql/legacy/TestsConstants.java
Add CASCADED_NESTED enum constant to Index with mapping and data files; introduce TEST_INDEX_CASCADED_NESTED string constant.
Test Data & Mapping
integ-test/src/test/resources/cascaded_nested.json, integ-test/src/test/resources/indexDefinitions/cascaded_nested_index_mapping.json
Introduce cascaded nested test dataset (three author documents with nested books and reviews) and corresponding ES mapping defining nested author, books (nested), and reviews (nested) structures.
Expected Output — Calcite Pushdown
integ-test/src/test/resources/expectedOutput/calcite/filter_computed_nested.yaml, filter_nested_term.yaml, filter_nested_terms.yaml, filter_root_and_nested.yaml, filter_multiple_nested_cascaded_range.yaml, agg_filter_nested.yaml
Add six YAML fixtures containing expected Calcite logical and physical plans for nested filter pushdown scenarios, including script queries, nested DSL wrapping, and aggregation pushdown contexts.
Expected Output — Calcite No-Pushdown
integ-test/src/test/resources/expectedOutput/calcite_no_pushdown/filter_computed_nested.yaml, filter_nested_term.yaml, filter_nested_terms.yaml, filter_root_and_nested.yaml, filter_multiple_nested_cascaded_range.yaml, agg_filter_nested.yaml
Add six YAML fixtures containing expected Calcite logical and physical plans for nested filter scenarios without pushdown optimization, showing EnumerableCalc and EnumerableAggregate operators.
REST API Integration Test
integ-test/src/yamlRestTest/resources/rest-api-spec/test/issues/4508.yml
Add REST API test suite validating nested field evaluation with LENGTH() compute, filtering on computed fields, and mixed nested/root-level field conditions.
Core Predicate Analysis
opensearch/src/main/java/org/opensearch/sql/opensearch/request/PredicateAnalyzer.java
Enhance NamedFieldExpression with nestedPath tracking; add resolveNestedPath() utility for nested field detection; update SimpleQueryExpression and ScriptQueryExpression builders to wrap filters with nestedQuery DSL when nested paths are detected; add referredFields and fieldTypes tracking to ScriptQueryExpression.
Index Scan Filter Pushdown
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java
Modify pushDownFilter fieldTypes calculation to include all index field types instead of filtering to current schema fields only.

Sequence Diagram

sequenceDiagram
    autonumber
    participant Calcite
    participant PredicateAnalyzer
    participant FilterBuilder
    participant OpenSearch as OpenSearch Query DSL

    Note over Calcite,OpenSearch: Filter on Nested Field Processing

    Calcite->>PredicateAnalyzer: analyzeFilter(field: "items.name", operator, value)
    PredicateAnalyzer->>PredicateAnalyzer: resolveNestedPath("items.name", fieldTypes)
    Note over PredicateAnalyzer: Detect nested path: "items"
    
    alt Simple Filter (term/range)
        PredicateAnalyzer->>FilterBuilder: buildSimpleQuery(condition)
        Note over FilterBuilder: Create term/range query on items.name
        FilterBuilder->>FilterBuilder: wrapInNestedQuery(query, path="items")
        FilterBuilder-->>OpenSearch: {nested: {path: "items", query: {...}}}
    else Script Filter (computed field)
        PredicateAnalyzer->>FilterBuilder: buildScriptQuery(expression: "LENGTH(items.name) > 5")
        Note over FilterBuilder: Create script expression<br/>tracking referredFields
        FilterBuilder->>FilterBuilder: Detect single nested path from referredFields
        FilterBuilder->>FilterBuilder: wrapInNestedQuery(scriptQuery, path="items")
        FilterBuilder-->>OpenSearch: {nested: {path: "items", query: {script: {...}}}}
    else Multiple Nested Paths
        PredicateAnalyzer->>PredicateAnalyzer: Detect paths: ["items", "metadata"]
        PredicateAnalyzer-->>PredicateAnalyzer: throw UnsupportedScriptException
    end

    OpenSearch->>OpenSearch: Evaluate nested query with proper semantics

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Areas requiring extra attention:

  • PredicateAnalyzer.java: Core nested path resolution and query wrapping logic requires careful validation for correctness across simple filters, script queries, and aggregations; nested path detection and multiple-path error handling logic must be thoroughly reviewed
  • Cascaded nested test scenarios: Multi-level nested hierarchies (author → books → reviews) with filtering across different levels require validation that nested DSL wrapping applies correct path semantics
  • Script query nested wrapping: Ensure referredFields tracking correctly identifies all nested field dependencies and that script wrapping logic handles edge cases (e.g., mixing nested and root fields)
  • Expected output YAML fixtures: Verify physical plan pushdown contexts accurately reflect nested query DSL structure, particularly complex scenarios like computed filters and aggregation filters over nested fields

Poem

🐰 In nested fields we dig so deep, Where reviews sleep and books they keep, Now scripts don't fail when paths are far, DSL wraps queries like a star! ⭐ Cascaded hierarchies at last, Filtering shadows of the past!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 7.14% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: pushing down filters on nested fields as nested queries, which aligns with the PR's core objective to fix filtering on nested fields.
Linked Issues check ✅ Passed The PR implements the core objective from #4508: generating proper OpenSearch nested query DSL instead of script queries when expressions reference nested fields, addressing the bug where filters on computed nested fields returned zero results.
Out of Scope Changes check ✅ Passed All changes are directly aligned with the PR objectives: adding test coverage for nested field filtering, implementing nested query generation in PredicateAnalyzer, and extending field type handling in CalciteLogicalIndexScan for proper nested path resolution.
✨ Finishing touches
  • [ ] 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • [ ] Create PR with unit tests
  • [ ] Post copyable unit tests in a comment

[!TIP]

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions: | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context. Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot] avatar Nov 27 '25 02:11 coderabbitai[bot]

This PR is stalled because it has been open for 2 weeks with no activity.