Output Processor Multi-Threading not working as expected (Mutex wait)
Bug Report
Describe the bug We've observed an issue related to output multi-threading/worker option that seems to affect the performance of the processor in our case. Observations:
- Output multi-threading does not seem to work as expected.
- Specifically, it appears that each thread is spending a significant amount of time in the mutexwait of the flb_processor_run instead of being effectively parallelized.
For instance (in our configuration):
- With 2 output workers, each output thread waits in mutexwait for approximately 50% of the time.
- With 10 output workers, this waiting time increases to around 90%.
- With 100 output workers, it almost reaches 99% of the time, indicating a significant serial processing bottleneck.
- Currently, there is no measurable benefit when using the multi-threading option of the outputs, except for very rare use cases where the log sink is notably slow or has a high response time.
Expected Behavior: The output threads (including the processors) should ideally run in parallel, minimizing the waiting time in mutexwait and thus optimizing the overall performance.
To Reproduce Simplified example configuration: https://gist.github.com/drbugfinder-work/456ef9715db25372a935d2d3a997e049 (You may have to adjust the number of dummy inputs / lua calculation to get similar results on your machine.)
Screenshots
-
2 workers:
-
10 workers:
-
100 workers:
Your Environment
- Version used: v2.2.0
Hi @drbugfinder-work, what you observed is the expected behavior when filters are used in the processor stack in the output stage.
We are working on improving the situation but there is no way around it at the moment.
@leonardo-albertovich thanks for clarification. I was wondering, because @patrick-stephens mentioned this as a solution how to parallelize filters (see: https://github.com/fluent/fluent-bit/issues/8088#issuecomment-1818985119 & https://github.com/fluent/fluent-bit/issues/8088#issuecomment-1819057559)
Yeah I wasn't really thinking of adding multiple workers as well as processors though :)
@patrick-stephens I've added a chunk mode to the lua filter, so with the help of lua lanes the lua script can be parallelized on a whole chunk https://github.com/fluent/fluent-bit/pull/8478
Sounds interesting @drbugfinder-work , @tarruda and @agup006 may be interested in that PR.
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.
Still open
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.
This issue was closed because it has been stalled for 5 days with no activity.