fluent-bit icon indicating copy to clipboard operation
fluent-bit copied to clipboard

engine: close descriptors opened by mk_event_channel_create.

Open pwhelan opened this issue 3 years ago • 5 comments

This pull request should fix several file descriptor leaks, most of which have not caused any issues since they usually only occur once when starting and stopping the main engine.

This work is relevant for #5580.


Enter [N/A] in the box, if an item is not applicable to your change.

Testing Before we can approve your change; please submit the following in a comment:

  • [x] Example configuration file for the change
  • [x] Debug log output from testing the change
  • [x] Attached Valgrind output that shows no leaks or memory corruption was found

Backporting

  • [ ] Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

pwhelan avatar Aug 01 '22 18:08 pwhelan

Here is a valgrind run with the file descriptor check enabled:

valgrind --track-fds=yes --leak-check=full ./bin/fluent-bit -i dummy -o stdout -m '*' -f 1
==22151== Memcheck, a memory error detector
==22151== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==22151== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==22151== Command: ./bin/fluent-bit -i dummy -o stdout -m * -f 1
==22151== 
Fluent Bit v1.9.7
* Copyright (C) 2015-2022 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2022/08/01 14:36:31] [ info] [fluent bit] version=1.9.7, commit=dc970d8fd2, pid=22151
[2022/08/01 14:36:31] [ info] [storage] version=1.2.0, type=memory-only, sync=normal, checksum=disabled, max_chunks_up=128
[2022/08/01 14:36:31] [ info] [cmetrics] version=0.3.5
[2022/08/01 14:36:31] [ info] [sp] stream processor started
[2022/08/01 14:36:31] [ info] [output:stdout:stdout.0] worker #0 started
[0] dummy.0: [1659378992.113382214, {"message"=>"dummy"}]
^C[2022/08/01 14:36:33] [engine] caught signal (SIGINT)
[0] dummy.0: [1659378993.104012591, {"message"=>"dummy"}]
[2022/08/01 14:36:33] [ warn] [engine] service will shutdown in max 5 seconds
[2022/08/01 14:36:34] [ info] [engine] service has stopped (0 pending tasks)
[2022/08/01 14:36:34] [ info] [output:stdout:stdout.0] thread worker #0 stopping...
[2022/08/01 14:36:34] [ info] [output:stdout:stdout.0] thread worker #0 stopped
==22151== 
==22151== FILE DESCRIPTORS: 3 open (3 std) at exit.
==22151== 
==22151== HEAP SUMMARY:
==22151==     in use at exit: 107,042 bytes in 3,675 blocks
==22151==   total heap usage: 5,650 allocs, 1,975 frees, 1,025,491 bytes allocated
==22151== 
==22151== LEAK SUMMARY:
==22151==    definitely lost: 0 bytes in 0 blocks
==22151==    indirectly lost: 0 bytes in 0 blocks
==22151==      possibly lost: 0 bytes in 0 blocks
==22151==    still reachable: 107,042 bytes in 3,675 blocks
==22151==         suppressed: 0 bytes in 0 blocks
==22151== Reachable blocks (those to which a pointer was found) are not shown.
==22151== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==22151== 
==22151== For lists of detected and suppressed errors, rerun with: -s
==22151== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

pwhelan avatar Aug 01 '22 18:08 pwhelan

The same run without valgrind but with verbose logs enabled:

./bin/fluent-bit -v -i dummy -o stdout -m '*' -f 1
Fluent Bit v1.9.7
* Copyright (C) 2015-2022 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2022/08/01 14:42:45] [ info] Configuration:
[2022/08/01 14:42:45] [ info]  flush time     | 1.000000 seconds
[2022/08/01 14:42:45] [ info]  grace          | 5 seconds
[2022/08/01 14:42:45] [ info]  daemon         | 0
[2022/08/01 14:42:45] [ info] ___________
[2022/08/01 14:42:45] [ info]  inputs:
[2022/08/01 14:42:45] [ info]      dummy
[2022/08/01 14:42:45] [ info] ___________
[2022/08/01 14:42:45] [ info]  filters:
[2022/08/01 14:42:45] [ info] ___________
[2022/08/01 14:42:45] [ info]  outputs:
[2022/08/01 14:42:45] [ info]      stdout.0
[2022/08/01 14:42:45] [ info] ___________
[2022/08/01 14:42:45] [ info]  collectors:
[2022/08/01 14:42:45] [ info] [fluent bit] version=1.9.7, commit=dc970d8fd2, pid=23191
[2022/08/01 14:42:45] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2022/08/01 14:42:45] [ info] [storage] version=1.2.0, type=memory-only, sync=normal, checksum=disabled, max_chunks_up=128
[2022/08/01 14:42:45] [ info] [cmetrics] version=0.3.5
[2022/08/01 14:42:45] [debug] [dummy:dummy.0] created event channels: read=21 write=22
[2022/08/01 14:42:45] [debug] [stdout:stdout.0] created event channels: read=23 write=24
[2022/08/01 14:42:45] [debug] [router] match rule dummy.0:stdout.0
[2022/08/01 14:42:45] [ info] [sp] stream processor started
[2022/08/01 14:42:45] [ info] [output:stdout:stdout.0] worker #0 started
[2022/08/01 14:42:46] [debug] [input chunk] update output instances with new chunk size diff=26
[2022/08/01 14:42:47] [debug] [task] created task=0x7ffff000e750 id=0 OK
[2022/08/01 14:42:47] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
[0] dummy.0: [1659379366.096184063, {"message"=>"dummy"}]
[2022/08/01 14:42:47] [debug] [input chunk] update output instances with new chunk size diff=26
[2022/08/01 14:42:47] [debug] [out flush] cb_destroy coro_id=0
[2022/08/01 14:42:47] [debug] [task] destroy task=0x7ffff000e750 (task_id=0)
[2022/08/01 14:42:48] [debug] [task] created task=0x7ffff000e9e0 id=0 OK
[2022/08/01 14:42:48] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
[0] dummy.0: [1659379367.096197757, {"message"=>"dummy"}]
[2022/08/01 14:42:48] [debug] [out flush] cb_destroy coro_id=1
[2022/08/01 14:42:48] [debug] [input chunk] update output instances with new chunk size diff=26
[2022/08/01 14:42:48] [debug] [task] destroy task=0x7ffff000e9e0 (task_id=0)
[2022/08/01 14:42:49] [debug] [task] created task=0x7ffff000eb60 id=0 OK
[2022/08/01 14:42:49] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
[0] dummy.0: [1659379368.096204104, {"message"=>"dummy"}]
[2022/08/01 14:42:49] [debug] [out flush] cb_destroy coro_id=2
[2022/08/01 14:42:49] [debug] [input chunk] update output instances with new chunk size diff=26
[2022/08/01 14:42:49] [debug] [task] destroy task=0x7ffff000eb60 (task_id=0)
[2022/08/01 14:42:50] [debug] [task] created task=0x7ffff000ece0 id=0 OK
[2022/08/01 14:42:50] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
[0] dummy.0: [1659379369.096199465, {"message"=>"dummy"}]
[2022/08/01 14:42:50] [debug] [input chunk] update output instances with new chunk size diff=26
[2022/08/01 14:42:50] [debug] [out flush] cb_destroy coro_id=3
[2022/08/01 14:42:50] [debug] [task] destroy task=0x7ffff000ece0 (task_id=0)
^C[2022/08/01 14:42:50] [engine] caught signal (SIGINT)
[2022/08/01 14:42:50] [debug] [task] created task=0x7ffff000edb0 id=0 OK
[2022/08/01 14:42:50] [debug] [output:stdout:stdout.0] task_id=0 assigned to thread #0
[2022/08/01 14:42:50] [ warn] [engine] service will shutdown in max 5 seconds
[0] dummy.0: [1659379370.096203320, {"message"=>"dummy"}]
[2022/08/01 14:42:50] [debug] [out flush] cb_destroy coro_id=4
[2022/08/01 14:42:50] [debug] [task] destroy task=0x7ffff000edb0 (task_id=0)
[2022/08/01 14:42:51] [ info] [engine] service has stopped (0 pending tasks)
[2022/08/01 14:42:51] [ info] [output:stdout:stdout.0] thread worker #0 stopping...
[2022/08/01 14:42:51] [ info] [output:stdout:stdout.0] thread worker #0 stopped

pwhelan avatar Aug 01 '22 18:08 pwhelan

@pwhelan thanks for this.

note that it's better to implement the native call in monkey/monkey to destroy a channel, CI is failing because it needs to use the pipe wrappers to make it work properly

edsiper avatar Aug 02 '22 03:08 edsiper

@pwhelan Great PR (also commenting so I get subscribed to this PR)

ptsneves avatar Aug 02 '22 08:08 ptsneves

note that it's better to implement the native call in monkey/monkey to destroy a channel

I opened a PR to add a new mk_event_channel_destroy function: https://github.com/monkey/monkey/pull/373. This function is still not implemented in any PR for fluent-bit.

pwhelan avatar Aug 02 '22 14:08 pwhelan

I will revisit this issue later while working on reload.

pwhelan avatar Jul 19 '23 01:07 pwhelan