fluent-bit icon indicating copy to clipboard operation
fluent-bit copied to clipboard

Output Processor Multi-Threading not working as expected (Mutex wait)

Open drbugfinder-work opened this issue 2 years ago • 7 comments

Bug Report

Describe the bug We've observed an issue related to output multi-threading/worker option that seems to affect the performance of the processor in our case. Observations:

  • Output multi-threading does not seem to work as expected.
  • Specifically, it appears that each thread is spending a significant amount of time in the mutexwait of the flb_processor_run instead of being effectively parallelized.

For instance (in our configuration):

  • With 2 output workers, each output thread waits in mutexwait for approximately 50% of the time.
  • With 10 output workers, this waiting time increases to around 90%.
  • With 100 output workers, it almost reaches 99% of the time, indicating a significant serial processing bottleneck.
  • Currently, there is no measurable benefit when using the multi-threading option of the outputs, except for very rare use cases where the log sink is notably slow or has a high response time.

Expected Behavior: The output threads (including the processors) should ideally run in parallel, minimizing the waiting time in mutexwait and thus optimizing the overall performance.

To Reproduce Simplified example configuration: https://gist.github.com/drbugfinder-work/456ef9715db25372a935d2d3a997e049 (You may have to adjust the number of dummy inputs / lua calculation to get similar results on your machine.)

Screenshots

  • 2 workers: Bildschirmfoto 2023-12-04 um 15 59 43

  • 10 workers: Bildschirmfoto 2023-12-01 um 11 22 03

  • 100 workers: Bildschirmfoto 2023-12-01 um 09 13 33

Your Environment

  • Version used: v2.2.0

drbugfinder-work avatar Dec 04 '23 16:12 drbugfinder-work

Hi @drbugfinder-work, what you observed is the expected behavior when filters are used in the processor stack in the output stage.

We are working on improving the situation but there is no way around it at the moment.

leonardo-albertovich avatar Dec 04 '23 17:12 leonardo-albertovich

@leonardo-albertovich thanks for clarification. I was wondering, because @patrick-stephens mentioned this as a solution how to parallelize filters (see: https://github.com/fluent/fluent-bit/issues/8088#issuecomment-1818985119 & https://github.com/fluent/fluent-bit/issues/8088#issuecomment-1819057559)

drbugfinder-work avatar Dec 04 '23 17:12 drbugfinder-work

Yeah I wasn't really thinking of adding multiple workers as well as processors though :)

patrick-stephens avatar Dec 05 '23 10:12 patrick-stephens

@patrick-stephens I've added a chunk mode to the lua filter, so with the help of lua lanes the lua script can be parallelized on a whole chunk https://github.com/fluent/fluent-bit/pull/8478

drbugfinder-work avatar Feb 12 '24 14:02 drbugfinder-work

Sounds interesting @drbugfinder-work , @tarruda and @agup006 may be interested in that PR.

patrick-stephens avatar Feb 12 '24 14:02 patrick-stephens

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

github-actions[bot] avatar May 13 '24 01:05 github-actions[bot]

Still open

drbugfinder-work avatar May 16 '24 11:05 drbugfinder-work

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

github-actions[bot] avatar Aug 16 '24 01:08 github-actions[bot]

This issue was closed because it has been stalled for 5 days with no activity.

github-actions[bot] avatar Aug 22 '24 01:08 github-actions[bot]