graylog2-server
graylog2-server copied to clipboard
Optimize message memory footprint in output path
This is a follow-up issue with tasks remaining from https://github.com/Graylog2/graylog2-server/pull/19982:
TODO:
- [x] Consider not memoizing serialized messages when batch size config is still count-based:
- https://github.com/Graylog2/graylog2-server/pull/19982#discussion_r1701599821:
What do you think about skipping the serialization if output_batch_size is configured for a number of messages instead of bytes? Otherwise, we would always serialize early and have the memory overhead, even if we don't need it.
- [ ] Consider clearing memoized serialized value as soon as it has been sent to the indexer, to make it available to GC immediately
- https://github.com/Graylog2/graylog2-server/pull/19982#discussion_r1701625862:
I wonder if we should clear the serialized message from the cache here to avoid keeping it in memory for the lifetime of the message object.
Not sure if we can do that here because the serialized message might be used by other outputs? Right now we only use it in the Elasticsearch/OpenSearch outputs.
- [x] Not related to memory footprint, but to improve the
ImmutableMessageinterface a bit, return Immutable* collections where applicable:- https://github.com/Graylog2/graylog2-server/pull/19982#discussion_r1701601868
- [ ] (Optional, for bonus points) Try to remove the
invalidTimestampMeterfrom the serialization calls. It feels like it shouldn't be there.