in_tail: Add option file_cache_advise to reduce file cache usage
This PR will add option file_cache_advise to the tail plugin. It allows to set the posix_fadvise in POSIX_FADV_DONTNEED mode. This will drastically reduce the usage of the kernel file cache.
This can be necessary when running in K8s environments, where the file cache is accounted to the total memory usage of the pod (see: https://github.com/kubernetes/kubernetes/issues/43916). I/O intensive loads (like Fluent Bit) might run into OOM-kills because of filling the file cache with reads of the tail plugin.
Enter [N/A] in the box, if an item is not applicable to your change.
Testing Before we can approve your change; please submit the following in a comment:
- [x] Example configuration file for the change
- [x] Debug log output from testing the change
- [N/A] Attached Valgrind output that shows no leaks or memory corruption was found
Documentation
- [x] Documentation required for this feature
https://github.com/fluent/fluent-bit-docs/pull/1291
Backporting
- [N/A] Backport to latest stable release.
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
fluent-bit.conf
[SERVICE]
Flush 1
Log_Level info
Parsers_File parsers.conf
Daemon off
[INPUT]
Name tail
Tag dummy
Path test/log_file.log
file_cache_advise On
Mem_Buf_Limit 5MB
Skip_Long_Lines On
Refresh_Interval 10
[OUTPUT]
Name null
Match dummy
Valgrind & Debug Level Output:
root@test:~/fbtest# valgrind ./fluent-bit -c fluent-bit.conf
==62202== Memcheck, a memory error detector
==62202== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==62202== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==62202== Command: ./fluent-bit -c fluent-bit.conf
==62202==
Fluent Bit v3.0.0
* Copyright (C) 2015-2024 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io
____________________
< Fluent Bit v2.2.2 >
-------------------
\
\
\ __---__
_- /--______
__--( / \ )XXXXXXXXXXX\v.
.-XXX( O O )XXXXXXXXXXXXXXX-
/XXX( U ) XXXXXXX\
/XXXXX( )--_ XXXXXXXXXXX\
/XXXXX/ ( O ) XXXXXX \XXXXX\
XXXXX/ / XXXXXX \__ \XXXXX
XXXXXX__/ XXXXXX \__---->
---___ XXX__/ XXXXXX \__ /
\- --__/ ___/\ XXXXXX / ___--/=
\-\ ___/ XXXXXX '--- XXXXXX
\-\/XXX\ XXXXXX /XXXXX
\XXXXXXXXX \ /XXXXX/
\XXXXXX > _/XXXXX/
\XXXXX--__/ __-- XXXX/
-XXXXXXXX--------------- XXXXXX-
\XXXXXXXXXXXXXXXXXXXXXXXXXX/
""VXXXXXXXXXXXXXXXXXXV""
[2024/01/25 14:57:01] [ info] Configuration:
[2024/01/25 14:57:01] [ info] flush time | 1.000000 seconds
[2024/01/25 14:57:01] [ info] grace | 5 seconds
[2024/01/25 14:57:01] [ info] daemon | 0
[2024/01/25 14:57:01] [ info] ___________
[2024/01/25 14:57:01] [ info] inputs:
[2024/01/25 14:57:01] [ info] tail
[2024/01/25 14:57:01] [ info] ___________
[2024/01/25 14:57:01] [ info] filters:
[2024/01/25 14:57:01] [ info] ___________
[2024/01/25 14:57:01] [ info] outputs:
[2024/01/25 14:57:01] [ info] null.0
[2024/01/25 14:57:01] [ info] ___________
[2024/01/25 14:57:01] [ info] collectors:
[2024/01/25 14:57:01] [ info] [fluent bit] version=3.0.0, commit=71746b3571, pid=62202
[2024/01/25 14:57:01] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2024/01/25 14:57:01] [ info] [storage] ver=1.5.1, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2024/01/25 14:57:01] [ info] [cmetrics] version=0.6.6
[2024/01/25 14:57:01] [ info] [ctraces ] version=0.4.0
[2024/01/25 14:57:01] [ info] [input:tail:tail.0] initializing
[2024/01/25 14:57:01] [ info] [output:null:null.0] worker #0 started
[2024/01/25 14:57:01] [ info] [input:tail:tail.0] storage_strategy='memory' (memory only)
[2024/01/25 14:57:01] [debug] [tail:tail.0] created event channels: read=21 write=22
[2024/01/25 14:57:01] [debug] [input:tail:tail.0] flb_tail_fs_inotify_init() initializing inotify tail input
[2024/01/25 14:57:01] [debug] [input:tail:tail.0] inotify watch fd=27
[2024/01/25 14:57:01] [debug] [input:tail:tail.0] scanning path test/log_file_9_2024-01-25-14:10:20.log
[2024/01/25 14:57:01] [debug] [input:tail:tail.0] file will be read in POSIX_FADV_DONTNEED mode test/log_file_9_2024-01-25-14:10:20.log
[2024/01/25 14:57:01] [debug] [input:tail:tail.0] inode=1835385 with offset=4335453 appended as test/log_file_9_2024-01-25-14:10:20.log
[2024/01/25 14:57:01] [debug] [input:tail:tail.0] scan_glob add(): test/log_file_9_2024-01-25-14:10:20.log, inode 1835385
[2024/01/25 14:57:01] [debug] [input:tail:tail.0] 1 new files found on path 'test/log_file_9_2024-01-25-14:10:20.log'
[2024/01/25 14:57:01] [debug] [null:null.0] created event channels: read=29 write=30
[2024/01/25 14:57:01] [ info] [sp] stream processor started
[2024/01/25 14:57:01] [debug] [input:tail:tail.0] inode=1835385 file=test/log_file_9_2024-01-25-14:10:20.log promote to TAIL_EVENT
[2024/01/25 14:57:01] [ info] [input:tail:tail.0] inotify_fs_add(): inode=1835385 watch_fd=1 name=test/log_file_9_2024-01-25-14:10:20.log
[2024/01/25 14:57:01] [debug] [input:tail:tail.0] [static files] processed 0b, done
[2024/01/25 14:57:10] [debug] [input:tail:tail.0] scanning path test/log_file_9_2024-01-25-14:10:20.log
[2024/01/25 14:57:10] [debug] [input:tail:tail.0] scan_blog add(): dismissed: test/log_file_9_2024-01-25-14:10:20.log, inode 1835385
[2024/01/25 14:57:10] [debug] [input:tail:tail.0] 0 new files found on path 'test/log_file_9_2024-01-25-14:10:20.log'
^C[2024/01/25 14:57:13] [engine] caught signal (SIGINT)
[2024/01/25 14:57:13] [ warn] [engine] service will shutdown in max 5 seconds
[2024/01/25 14:57:13] [ info] [input] pausing tail.0
[2024/01/25 14:57:14] [ info] [engine] service has stopped (0 pending tasks)
[2024/01/25 14:57:14] [ info] [input] pausing tail.0
[2024/01/25 14:57:14] [ info] [output:null:null.0] thread worker #0 stopping...
[2024/01/25 14:57:14] [debug] [input:tail:tail.0] inode=1835385 removing file name test/log_file_9_2024-01-25-14:10:20.log
[2024/01/25 14:57:14] [ info] [output:null:null.0] thread worker #0 stopped
==62202==
==62202== HEAP SUMMARY:
==62202== in use at exit: 0 bytes in 0 blocks
==62202== total heap usage: 4,534 allocs, 4,534 frees, 863,629 bytes allocated
==62202==
==62202== All heap blocks were freed -- no leaks are possible
==62202==
==62202== For lists of detected and suppressed errors, rerun with: -s
==62202== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Docs PR: https://github.com/fluent/fluent-bit-docs/pull/1291
@cosmo0920 would you be able to take a look at this?
@agup006 @fujimotos @leonardo-albertovich @patrick-stephens
Is this now ready to be merged for Fluent Bit v3.0.0?
I had to resolve a minor merge-conflicts with the new master, so I needed to update this. No change in logic. @cosmo0920 @braydonk @agup006
Tested this in a large cluster where we had frequent oomkill events and can confirm it has helped a lot! Any reason we wouldn't want to default this option to On?
@brenix Thank you for testing. In the long-term, I also definitely would recommend to turn this option on by default, but for the start I don't want create any regressions.
From my side I can also report that in our clusters with many hundreds of Fluent Bit instances running, this fix definitely reduced the OOMkill rate (caused by the high file cache usage with tail plugin) to virtually zero - I am using this for almost 3 months.
@edsiper @agup006 @patrick-stephens @fujimotos @leonardo-albertovich @lecaros
I now changed the file_cache_advise option to be enabled by default. This is tested in our environments for over half a year without any issues.
Docs-PR is here: https://github.com/fluent/fluent-bit-docs/pull/1291
thank you !