graylog2-server icon indicating copy to clipboard operation
graylog2-server copied to clipboard

using `OR` in a search query returns error `too_many_nested_clauses`

Open drewmiranda-gl opened this issue 3 years ago • 8 comments

Executing a search query containing OR returns the following error:

Unable to perform search query: OpenSearch exception [type=too_many_nested_clauses, reason=Query contains too many nested clauses; maxClauseCount is set to 1024].

image

I can somewhat replicate this querying opensearch directly, for example: /*/_search?q=?q=Allow%20OR%20Deny . Whats interesting is that some indexes return results without any issue while others return the too_many_nested_clauses error.

image

Limiting my graylog query to streams that return results without issue work correctly.

Below are all the indexes that opensearch returned exceptions for when using OR in a search query:

gl_linux_auditbeat_150	too_many_nested_clauses	Query contains too many nested clauses; maxClauseCount is set to 1024
gl_linux_auditbeat_151	too_many_nested_clauses	Query contains too many nested clauses; maxClauseCount is set to 1024
gl_linux_auditbeat_152	too_many_nested_clauses	Query contains too many nested clauses; maxClauseCount is set to 1024
gl_linux_auditbeat_153	too_many_nested_clauses	Query contains too many nested clauses; maxClauseCount is set to 1024
gl_linux_auditbeat_154	too_many_nested_clauses	Query contains too many nested clauses; maxClauseCount is set to 1024
gl_linux_auditbeat_155	too_many_nested_clauses	Query contains too many nested clauses; maxClauseCount is set to 1024
gl_linux_auditbeat_156	too_many_nested_clauses	Query contains too many nested clauses; maxClauseCount is set to 1024
gl_linux_auditbeat_157	too_many_nested_clauses	Query contains too many nested clauses; maxClauseCount is set to 1024
gl_linux_auditbeat_158	too_many_nested_clauses	Query contains too many nested clauses; maxClauseCount is set to 1024
gl_linux_auditbeat_159	too_many_nested_clauses	Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_common_133	too_many_nested_clauses	Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_common_138	too_many_nested_clauses	Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_common_139	too_many_nested_clauses	Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_common_140	too_many_nested_clauses	Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_common_143	too_many_nested_clauses	Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_common_144	too_many_nested_clauses	Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_common_145	too_many_nested_clauses	Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_common_146	too_many_nested_clauses	Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_common_147	too_many_nested_clauses	Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_common_148	too_many_nested_clauses	Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_common_150	too_many_nested_clauses	Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_common_151	too_many_nested_clauses	Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_common_153	too_many_nested_clauses	Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_common_154	too_many_nested_clauses	Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_common_155	too_many_nested_clauses	Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_common_157	too_many_nested_clauses	Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_common_158	too_many_nested_clauses	Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_security_132	too_many_nested_clauses	Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_security_137	too_many_nested_clauses	Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_security_143	too_many_nested_clauses	Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_security_144	too_many_nested_clauses	Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_security_146	too_many_nested_clauses	Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_security_149	too_many_nested_clauses	Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_security_150	too_many_nested_clauses	Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_security_153	too_many_nested_clauses	Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_security_154	too_many_nested_clauses	Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_security_156	too_many_nested_clauses	Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_security_157	too_many_nested_clauses	Query contains too many nested clauses; maxClauseCount is set to 1024
graylog_2	too_many_nested_clauses	Query contains too many nested clauses; maxClauseCount is set to 1024

Expected Behavior

Graylog returns search results when using the OR statement.

Current Behavior

Graylog doesn't return search results for some streams/indices.

Possible Solution

Unknown. not clear if this is related to https://github.com/opensearch-project/OpenSearch/issues/3652 but the fact that this can be replicated independent of graylog could mean its an issue with OpenSearch and not graylog.

Steps to Reproduce (for bugs)

  1. Execute a search query for Deny OR Allow

Context

I was testing various sigma rules and encountered this when a rule (Django Framework Exceptions) generated a query with several OR statements and generated the error described above.

Your Environment

  • Graylog Version: 5.0.1
  • Java Version: 17.0.5
  • Elasticsearch Version: OpenSearch 2.4.1
  • MongoDB Version: 5.0.14
  • Operating System: Ubuntu Server 20.04 LTS
  • Browser version: Chrome 108.0.5359.124

drewmiranda-gl avatar Dec 20 '22 22:12 drewmiranda-gl

Hi Drew, Thank you for reporting this issue. I think you are right that it's related to the linked OS issue. It is not really a bug in GL or OS, it's actually working as intended.

The problematic part is the query itself: allow OR denny, which, without any fields mentioned, will expand to all available fields. You can verify that with the validate API:

http://localhost:9200/graylog_242/_validate/query?explain=true&q=allow OR denny

You will get the fully expanded query, which indeed uses all available fields.

{
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "valid": true,
  "explanations": [
    {
      "index": "graylog_242",
      "valid": true,
      "explanation": "(http_method:allow | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"allow\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"allow\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"allow\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"allow\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"allow\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"allow\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"allow\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"allow\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"allow\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"allow\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"allow\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"allow\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"allow\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"allow\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"allow\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"allow\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"allow\"]\") | gl2_source_node:allow | source:allow | gl2_message_id:allow | controller:allow | resource:allow | message:allow | gl2_source_input:allow | streams:allow | action:allow | full_message:allow) (streams:denny | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"denny\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"denny\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"denny\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"denny\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"denny\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"denny\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"denny\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"denny\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"denny\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"denny\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"denny\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"denny\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"denny\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"denny\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"denny\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"denny\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"denny\"]\") | gl2_source_node:denny | source:denny | gl2_message_id:denny | action:denny | controller:denny | http_method:denny | message:denny | resource:denny | full_message:denny | gl2_source_input:denny)"
    }
  ]
}

(the number_format_exception clauses are caused by a numeric field type trying to accept string value, but you see the pattern)

This also explains why you observe the behaviour for some indices and not for others. Given your query allow OR denny, this will expand for each field as two queries (http_method:allow | http_method:denny). So if your index has more than 512 fields mapped, you'll automatically hit the maxClauseCount set by default to 1024.

I'll discuss that in the search team and let you know if we can prevent this (maybe by using our validation) or if there is any way how to better inform users what's happening and why.

todvora avatar Dec 21 '22 07:12 todvora

For testing I added indices.query.bool.max_clause_count: 2048 to my opensearch.yml config and after restarting OpenSearch i can successfully search for my original search term.

However, something more complex, like the Django Framework Exceptions sigma rule has a large number of ORs and no field specified.

(/SuspiciousOperation/ OR /DisallowedHost/ OR /DisallowedModelAdminLookup/ OR /DisallowedModelAdminToField/ OR /DisallowedRedirect/ OR /InvalidSessionKey/ OR /RequestDataTooBig/ OR /SuspiciousFileOperation/ OR /SuspiciousMultipartForm/ OR /SuspiciousSession/ OR /TooManyFieldsSent/ OR /PermissionDenied/)

We can see in the detection section of the rule that no field is specified, which is a totally valid sigma rule:

detection:
    keywords:
        - SuspiciousOperation
        # Subclasses of SuspiciousOperation
        - DisallowedHost
        - DisallowedModelAdminLookup
        - DisallowedModelAdminToField
        - DisallowedRedirect
        - InvalidSessionKey
        - RequestDataTooBig
        - SuspiciousFileOperation
        - SuspiciousMultipartForm
        - SuspiciousSession
        - TooManyFieldsSent
        # Further security-related exceptions
        - PermissionDenied
    condition: keywords

drewmiranda-gl avatar Dec 21 '22 13:12 drewmiranda-gl

Do we know which 2.x open-search version introduces this?

tellistone avatar Jan 12 '23 11:01 tellistone

Do we know which 2.x open-search version introduces this?

Pretty sure its 2.0 which introduced lucene 9 support.

image

See also https://issues.apache.org/jira/browse/LUCENE-8811 (linked from the above github issue). Not entirely a bug so much as it was an active choice made by lucene which is used by opensearch which is used by graylog. (it really is turtles all the way down all the way down innit?)

drewmiranda-gl avatar Jan 12 '23 23:01 drewmiranda-gl

Some more background information:

The too_many_nested_clauses error is a safeguard to avoid too much resource usage for OpenSearch. An Elastic engineer reported the LUCENE-8811 bug, which was probably triggered by issues Elastic ran into.

The hard limit was introduced in Elasticsearch 7.

image

Source: https://www.elastic.co/guide/en/elasticsearch/reference/6.8/query-dsl-query-string-query.html

The indices.query.bool.max_clause_count setting in Elasticsearch 7 has a default value of 1024. That's what OpenSearch inherited from the Elasticsearch 7 code base.

image

Source: https://www.elastic.co/guide/en/elasticsearch/reference/7.17/search-settings.html

Elasticsearch 8 deprecated the indices.query.bool.max_clause_count setting in favor of dynamically computing the value based on a node's available resources.

image

Source: https://www.elastic.co/guide/en/elasticsearch/reference/8.6/search-settings.html

There was an attempt to make this setting dynamic in OpenSearch, but it got pushback and rejected.

  • Summary: https://github.com/opensearch-project/OpenSearch/issues/1526#issuecomment-972436975
  • More detailed: https://github.com/opensearch-project/OpenSearch/pull/1527#pullrequestreview-809044909

bernd avatar Jan 13 '23 08:01 bernd

Now we've upgraded Graylog Cloud to OS 2.11, this appears to have started hitting us.

Indexes around the 1000 field count (eg. indexes receiving data from auditbeats and winlogbeats) seem to be most impacted. On an index with 1000 fields in Cloud, I can't perform two string search clauses in one search without hitting an error:

image

image

This only appears to impact string searches - checking fields for specific values works fine. a single string search clause works fine.

eg.

image

You can reproduce this on https://graylog-internal-ng.graylog.cloud/ by performing aggregation searches against the following streams:

image

This is a high impact problem - customers will want to do this, the interface certainly allows them to do this, and then the dashboard element will fail. 1000 field indexes are fairly common in the wild, since customers tend to use the default beats configs until they learn better. It has certainly impacted our customer support dashboards in graylog-internal-ng. presumably it also breaks sigma rules and possibly illuminate content running on indexes with many fields.

Reading https://github.com/opensearch-project/OpenSearch/issues/3652 - if I'm following this right, maxclausecount was made dynamic in OS 1.1 based on Heap & CPU core count, then changed to be static at 1024 on OS 2.0+

tellistone avatar Jan 03 '24 11:01 tellistone

We are starting to see this issue too now with Graylog 5.1.11 and OpenSearch 2.11.1.

ed-ud avatar Feb 16 '24 21:02 ed-ud

Hey @ed-ud,

thanks for reporting this! Can you elaborate a bit under which circumstances you are seeing it? Is raising the limit in the Opensearch config an option for you?

dennisoelkers avatar Feb 21 '24 15:02 dennisoelkers