When rotating main index, rolling-file logger can eat all ressources until java out of memory and graylog is down

Open euidzero opened this issue 4 years ago • 1 comments

Expected Behavior

logging to the file-rolling appender server.log file should not impair graylog perfomances.

Current Behavior

A some point main index is rotated. In case of a lot of indexing errors (we have arround 40k indexing errors per day) it seems all past errors are thrown again (reindexing ?) (we got more than 2M in our case ) then the rolling file log4j2 appender litteraly got a DOS and consumed all java resources until Graylog died because out of memory. The UI cannot be reached.

The issue started immediately after :

 2022-02-01T09:46:18.903+01:00 INFO  [AbstractRotationStrategy] Deflector index <Default index set> (index set <graylog_3>) should be rotated, Pointing deflector to new index now!
2022-02-01T09:46:18.906+01:00 INFO  [MongoIndexSet] Cycling from <graylog_3> to <graylog_4>.
2022-02-01T09:46:18.908+01:00 INFO  [MongoIndexSet] Creating target index <graylog_4>.
2022-02-01T09:46:19.152+01:00 INFO  [MongoIndexSet] Waiting for allocation of index <graylog_4>.
2022-02-01T09:46:19.158+01:00 INFO  [MongoIndexSet] Index <graylog_4> has been successfully allocated.
 2022-02-01T09:46:19.160+01:00 INFO  [MongoIndexSet] Pointing index alias <graylog_deflector> to new index <graylog_4>.
2022-02-01T09:46:19.197+01:00 INFO  [SystemJobManager] Submitted SystemJob <6cb827c0-833b-11ec-ba59-02384db95920> [org.graylog2.indexer.indices.jobs.SetIndexReadOnlyAndCalculateRangeJob]
2022-02-01T09:46:19.197+01:00 INFO  [MongoIndexSet] Successfully pointed index alias <graylog_deflector> to index <graylog_4>.

The logs are then full of

"ERROR [MessagesAdapterES7] Failed to index "

and the detailed messages:

[0]: index [graylog_4], type [_doc], id [71059070-835d-11ec-9adc-02384db95920], message [ElasticsearchException[Elasticsearch exception [type=mapper_parsing_exception, reason=failed to parse field [node_module] of type [date] in document with id '71059070-835d-11ec-9adc-02384db95920'. Preview of field's value: '[318]']]; nested: ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=failed to parse date field [[318]] with format [strict_date_optional_time||epoch_millis]]]; nested: ElasticsearchException[Elasticsearch exception [type=date_time_parse_exception, reason=Failed to parse with all enclosed parsers]];]

and we see :

 INFO  [Buffers] Waiting for buffers to drain. (0i/0p/24996o)

until we ultimately crash with :

java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space

Possible Solution

we tried to allocate more memory to the java heap (1GB-> 4GB) for graylog with no success.

we tried to filter out the messages with filters in log4j2.xml but none worked :

    <ThresholdFilter level="DEBUG" onMatch="ACCEPT" onMismatch="DENY"/>
    <RegexFilter regex=".*ElasticsearchException.*" onMatch="DENY" onMismatch="ACCEPT"/>

the only way to get graylog running again was to disable completely the appender :

<!-- AppenderRef ref="rolling-file"/   -->

The log file appender should maybe avoid logging the ElasticsearchException by default. But ultimately the log appender should not be able to impair graylog.

Steps to Reproduce (for bugs)

Index a lot of messages with a lot of indexing error. Rotate main index.

Your Environment

Graylog Version: 4.2.5
Java Version: openjdk-17
Elasticsearch Version: 7.10.2
MongoDB Version: 4.0.28
Operating System: ubuntu 20.04

Feb 01 '22 14:02 euidzero

This sounds like an operational problem.

If this is still an issue, may I ask how much memory the graylog node is provisioned with, and how much is assigned to HEAP?

Apr 26 '24 12:04 tellistone