using `OR` in a search query returns error `too_many_nested_clauses`
Executing a search query containing OR returns the following error:
Unable to perform search query: OpenSearch exception [type=too_many_nested_clauses, reason=Query contains too many nested clauses; maxClauseCount is set to 1024].

I can somewhat replicate this querying opensearch directly, for example: /*/_search?q=?q=Allow%20OR%20Deny . Whats interesting is that some indexes return results without any issue while others return the too_many_nested_clauses error.

Limiting my graylog query to streams that return results without issue work correctly.
Below are all the indexes that opensearch returned exceptions for when using OR in a search query:
gl_linux_auditbeat_150 too_many_nested_clauses Query contains too many nested clauses; maxClauseCount is set to 1024
gl_linux_auditbeat_151 too_many_nested_clauses Query contains too many nested clauses; maxClauseCount is set to 1024
gl_linux_auditbeat_152 too_many_nested_clauses Query contains too many nested clauses; maxClauseCount is set to 1024
gl_linux_auditbeat_153 too_many_nested_clauses Query contains too many nested clauses; maxClauseCount is set to 1024
gl_linux_auditbeat_154 too_many_nested_clauses Query contains too many nested clauses; maxClauseCount is set to 1024
gl_linux_auditbeat_155 too_many_nested_clauses Query contains too many nested clauses; maxClauseCount is set to 1024
gl_linux_auditbeat_156 too_many_nested_clauses Query contains too many nested clauses; maxClauseCount is set to 1024
gl_linux_auditbeat_157 too_many_nested_clauses Query contains too many nested clauses; maxClauseCount is set to 1024
gl_linux_auditbeat_158 too_many_nested_clauses Query contains too many nested clauses; maxClauseCount is set to 1024
gl_linux_auditbeat_159 too_many_nested_clauses Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_common_133 too_many_nested_clauses Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_common_138 too_many_nested_clauses Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_common_139 too_many_nested_clauses Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_common_140 too_many_nested_clauses Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_common_143 too_many_nested_clauses Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_common_144 too_many_nested_clauses Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_common_145 too_many_nested_clauses Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_common_146 too_many_nested_clauses Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_common_147 too_many_nested_clauses Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_common_148 too_many_nested_clauses Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_common_150 too_many_nested_clauses Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_common_151 too_many_nested_clauses Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_common_153 too_many_nested_clauses Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_common_154 too_many_nested_clauses Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_common_155 too_many_nested_clauses Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_common_157 too_many_nested_clauses Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_common_158 too_many_nested_clauses Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_security_132 too_many_nested_clauses Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_security_137 too_many_nested_clauses Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_security_143 too_many_nested_clauses Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_security_144 too_many_nested_clauses Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_security_146 too_many_nested_clauses Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_security_149 too_many_nested_clauses Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_security_150 too_many_nested_clauses Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_security_153 too_many_nested_clauses Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_security_154 too_many_nested_clauses Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_security_156 too_many_nested_clauses Query contains too many nested clauses; maxClauseCount is set to 1024
gl_windows_security_157 too_many_nested_clauses Query contains too many nested clauses; maxClauseCount is set to 1024
graylog_2 too_many_nested_clauses Query contains too many nested clauses; maxClauseCount is set to 1024
Expected Behavior
Graylog returns search results when using the OR statement.
Current Behavior
Graylog doesn't return search results for some streams/indices.
Possible Solution
Unknown. not clear if this is related to https://github.com/opensearch-project/OpenSearch/issues/3652 but the fact that this can be replicated independent of graylog could mean its an issue with OpenSearch and not graylog.
Steps to Reproduce (for bugs)
- Execute a search query for
Deny OR Allow
Context
I was testing various sigma rules and encountered this when a rule (Django Framework Exceptions) generated a query with several OR statements and generated the error described above.
Your Environment
- Graylog Version: 5.0.1
- Java Version: 17.0.5
- Elasticsearch Version: OpenSearch 2.4.1
- MongoDB Version: 5.0.14
- Operating System: Ubuntu Server 20.04 LTS
- Browser version: Chrome 108.0.5359.124
Hi Drew, Thank you for reporting this issue. I think you are right that it's related to the linked OS issue. It is not really a bug in GL or OS, it's actually working as intended.
The problematic part is the query itself: allow OR denny, which, without any fields mentioned, will expand to all available fields. You can verify that with the validate API:
http://localhost:9200/graylog_242/_validate/query?explain=true&q=allow OR denny
You will get the fully expanded query, which indeed uses all available fields.
{
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"valid": true,
"explanations": [
{
"index": "graylog_242",
"valid": true,
"explanation": "(http_method:allow | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"allow\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"allow\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"allow\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"allow\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"allow\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"allow\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"allow\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"allow\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"allow\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"allow\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"allow\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"allow\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"allow\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"allow\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"allow\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"allow\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"allow\"]\") | gl2_source_node:allow | source:allow | gl2_message_id:allow | controller:allow | resource:allow | message:allow | gl2_source_input:allow | streams:allow | action:allow | full_message:allow) (streams:denny | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"denny\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"denny\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"denny\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"denny\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"denny\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"denny\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"denny\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"denny\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"denny\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"denny\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"denny\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"denny\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"denny\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"denny\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"denny\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"denny\"]\") | MatchNoDocsQuery(\"failed [sequence_nr] query, caused by number_format_exception:[For input string: \"denny\"]\") | gl2_source_node:denny | source:denny | gl2_message_id:denny | action:denny | controller:denny | http_method:denny | message:denny | resource:denny | full_message:denny | gl2_source_input:denny)"
}
]
}
(the number_format_exception clauses are caused by a numeric field type trying to accept string value, but you see the pattern)
This also explains why you observe the behaviour for some indices and not for others. Given your query allow OR denny, this will expand for each field as two queries (http_method:allow | http_method:denny). So if your index has more than 512 fields mapped, you'll automatically hit the maxClauseCount set by default to 1024.
I'll discuss that in the search team and let you know if we can prevent this (maybe by using our validation) or if there is any way how to better inform users what's happening and why.
For testing I added indices.query.bool.max_clause_count: 2048 to my opensearch.yml config and after restarting OpenSearch i can successfully search for my original search term.
However, something more complex, like the Django Framework Exceptions sigma rule has a large number of ORs and no field specified.
(/SuspiciousOperation/ OR /DisallowedHost/ OR /DisallowedModelAdminLookup/ OR /DisallowedModelAdminToField/ OR /DisallowedRedirect/ OR /InvalidSessionKey/ OR /RequestDataTooBig/ OR /SuspiciousFileOperation/ OR /SuspiciousMultipartForm/ OR /SuspiciousSession/ OR /TooManyFieldsSent/ OR /PermissionDenied/)
We can see in the detection section of the rule that no field is specified, which is a totally valid sigma rule:
detection:
keywords:
- SuspiciousOperation
# Subclasses of SuspiciousOperation
- DisallowedHost
- DisallowedModelAdminLookup
- DisallowedModelAdminToField
- DisallowedRedirect
- InvalidSessionKey
- RequestDataTooBig
- SuspiciousFileOperation
- SuspiciousMultipartForm
- SuspiciousSession
- TooManyFieldsSent
# Further security-related exceptions
- PermissionDenied
condition: keywords
Do we know which 2.x open-search version introduces this?
Do we know which 2.x open-search version introduces this?
Pretty sure its 2.0 which introduced lucene 9 support.

See also https://issues.apache.org/jira/browse/LUCENE-8811 (linked from the above github issue). Not entirely a bug so much as it was an active choice made by lucene which is used by opensearch which is used by graylog. (it really is turtles all the way down all the way down innit?)
Some more background information:
The too_many_nested_clauses error is a safeguard to avoid too much resource usage for OpenSearch. An Elastic engineer reported the LUCENE-8811 bug, which was probably triggered by issues Elastic ran into.
The hard limit was introduced in Elasticsearch 7.

Source: https://www.elastic.co/guide/en/elasticsearch/reference/6.8/query-dsl-query-string-query.html
The indices.query.bool.max_clause_count setting in Elasticsearch 7 has a default value of 1024. That's what OpenSearch inherited from the Elasticsearch 7 code base.

Source: https://www.elastic.co/guide/en/elasticsearch/reference/7.17/search-settings.html
Elasticsearch 8 deprecated the indices.query.bool.max_clause_count setting in favor of dynamically computing the value based on a node's available resources.

Source: https://www.elastic.co/guide/en/elasticsearch/reference/8.6/search-settings.html
There was an attempt to make this setting dynamic in OpenSearch, but it got pushback and rejected.
- Summary: https://github.com/opensearch-project/OpenSearch/issues/1526#issuecomment-972436975
- More detailed: https://github.com/opensearch-project/OpenSearch/pull/1527#pullrequestreview-809044909
Now we've upgraded Graylog Cloud to OS 2.11, this appears to have started hitting us.
Indexes around the 1000 field count (eg. indexes receiving data from auditbeats and winlogbeats) seem to be most impacted. On an index with 1000 fields in Cloud, I can't perform two string search clauses in one search without hitting an error:
This only appears to impact string searches - checking fields for specific values works fine. a single string search clause works fine.
eg.
You can reproduce this on https://graylog-internal-ng.graylog.cloud/ by performing aggregation searches against the following streams:
This is a high impact problem - customers will want to do this, the interface certainly allows them to do this, and then the dashboard element will fail. 1000 field indexes are fairly common in the wild, since customers tend to use the default beats configs until they learn better. It has certainly impacted our customer support dashboards in graylog-internal-ng. presumably it also breaks sigma rules and possibly illuminate content running on indexes with many fields.
Reading https://github.com/opensearch-project/OpenSearch/issues/3652 - if I'm following this right, maxclausecount was made dynamic in OS 1.1 based on Heap & CPU core count, then changed to be static at 1024 on OS 2.0+
We are starting to see this issue too now with Graylog 5.1.11 and OpenSearch 2.11.1.
Hey @ed-ud,
thanks for reporting this! Can you elaborate a bit under which circumstances you are seeing it? Is raising the limit in the Opensearch config an option for you?