When rotating main index, rolling-file logger can eat all ressources until java out of memory and graylog is down
Expected Behavior
logging to the file-rolling appender server.log file should not impair graylog perfomances.
Current Behavior
A some point main index is rotated. In case of a lot of indexing errors (we have arround 40k indexing errors per day) it seems all past errors are thrown again (reindexing ?) (we got more than 2M in our case ) then the rolling file log4j2 appender litteraly got a DOS and consumed all java resources until Graylog died because out of memory. The UI cannot be reached.
The issue started immediately after :
2022-02-01T09:46:18.903+01:00 INFO [AbstractRotationStrategy] Deflector index <Default index set> (index set <graylog_3>) should be rotated, Pointing deflector to new index now!
2022-02-01T09:46:18.906+01:00 INFO [MongoIndexSet] Cycling from <graylog_3> to <graylog_4>.
2022-02-01T09:46:18.908+01:00 INFO [MongoIndexSet] Creating target index <graylog_4>.
2022-02-01T09:46:19.152+01:00 INFO [MongoIndexSet] Waiting for allocation of index <graylog_4>.
2022-02-01T09:46:19.158+01:00 INFO [MongoIndexSet] Index <graylog_4> has been successfully allocated.
2022-02-01T09:46:19.160+01:00 INFO [MongoIndexSet] Pointing index alias <graylog_deflector> to new index <graylog_4>.
2022-02-01T09:46:19.197+01:00 INFO [SystemJobManager] Submitted SystemJob <6cb827c0-833b-11ec-ba59-02384db95920> [org.graylog2.indexer.indices.jobs.SetIndexReadOnlyAndCalculateRangeJob]
2022-02-01T09:46:19.197+01:00 INFO [MongoIndexSet] Successfully pointed index alias <graylog_deflector> to index <graylog_4>.
The logs are then full of
"ERROR [MessagesAdapterES7] Failed to index "
and the detailed messages:
[0]: index [graylog_4], type [_doc], id [71059070-835d-11ec-9adc-02384db95920], message [ElasticsearchException[Elasticsearch exception [type=mapper_parsing_exception, reason=failed to parse field [node_module] of type [date] in document with id '71059070-835d-11ec-9adc-02384db95920'. Preview of field's value: '[318]']]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=failed to parse date field [[318]] with format [strict_date_optional_time||epoch_millis]]]; nested: ElasticsearchException[Elasticsearch exception [type=date_time_parse_exception, reason=Failed to parse with all enclosed parsers]];]
and we see :
INFO [Buffers] Waiting for buffers to drain. (0i/0p/24996o)
until we ultimately crash with :
java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space
Possible Solution
we tried to allocate more memory to the java heap (1GB-> 4GB) for graylog with no success.
we tried to filter out the messages with filters in log4j2.xml but none worked :
<ThresholdFilter level="DEBUG" onMatch="ACCEPT" onMismatch="DENY"/>
<RegexFilter regex=".*ElasticsearchException.*" onMatch="DENY" onMismatch="ACCEPT"/>
the only way to get graylog running again was to disable completely the appender :
<!-- AppenderRef ref="rolling-file"/ -->
The log file appender should maybe avoid logging the ElasticsearchException by default. But ultimately the log appender should not be able to impair graylog.
Steps to Reproduce (for bugs)
Index a lot of messages with a lot of indexing error. Rotate main index.
Your Environment
- Graylog Version: 4.2.5
- Java Version: openjdk-17
- Elasticsearch Version: 7.10.2
- MongoDB Version: 4.0.28
- Operating System: ubuntu 20.04
This sounds like an operational problem.
If this is still an issue, may I ask how much memory the graylog node is provisioned with, and how much is assigned to HEAP?