fluent-bit icon indicating copy to clipboard operation
fluent-bit copied to clipboard

Tail plugin incorrectly removing entries from the database file during startup

Open OS-tomaslaw opened this issue 1 year ago • 4 comments

Bug Report

Describe the bug

As part of the changes introduced in https://github.com/fluent/fluent-bit/pull/8062 (in this commit https://github.com/fluent/fluent-bit/pull/8062/commits/c60999c186c23cff79dad4dd31c838404ace228e), Fluent Bit now deletes items that are not being monitored when it starts.

However, this functionality causes an undesired behavior in the scenario where:

  • there is more than 1 instance of the Tail input plugin being used (each monitoring a specific pattern of file names)
  • there is a single database file to keep track of all monitored files and offsets

It seems that the changes introduced in 3.0.2 (https://github.com/fluent/fluent-bit/pull/8062), cause the Tail plugin to compare the currently monitored files with the entries in the database file. If there are entries in the database file that don't have a corresponding monitored file, those entries are removed and any tracking information on that file is lost.

To Reproduce Use the following configuration:

[SERVICE]
    log_level    error

[INPUT]
    Name           tail
    Read_from_Head true
    Path           /tmp/foo.txt
    db             /tmp/fluentbit.db

[INPUT]
    Name           tail
    Read_from_Head true
    Path           /tmp/bar.txt
    db             /tmp/fluentbit.db

[OUTPUT]
    Name stdout
    Match *

Where foo.txt and bar.txt contain: foo.txt

foo

bar.txt

bar

(don't forget the newline after the text)

Notice that on the first Fluent Bit start, the output contains the expected result.

[0] tail.0: [[1717786236.421992900, {}], {"log"=>"foo"}]
[0] tail.1: [[1717786236.422190800, {}], {"log"=>"bar"}]

Notice that after the first restart on Fluent Bit, the output contains the result for the bar.txt file again.

[0] tail.1: [[1717786246.720946400, {}], {"log"=>"bar"}]

Notice that after the second restart on Fluent Bit, the output contains the result for the foo.txt and bar.txt files again.

[0] tail.0: [[1717786252.299557300, {}], {"log"=>"foo"}]
[0] tail.1: [[1717786252.299697800, {}], {"log"=>"bar"}]

So it seems that the Tail plugin is having issues keeping track of the files/offset after Fluent Bit is restarted.

Looking at the log file, we can see these relevant entries:

[debug] [input:tail:tail.0] 1 new files found on path '/tmp/foo.txt'
[info] [input:tail:tail.0] db: delete unmonitored stale inodes from the database: count=0
[debug] [input:tail:tail.1] 1 new files found on path '/tmp/bar.txt'
[ info] [input:tail:tail.1] db: delete unmonitored stale inodes from the database: count=1

Expected behavior

The database file would not lose information on the files being monitored by Fluent Bit.

Additional context

I can see that a possible solution would be to use a separate database file for each Tail plugin instance. But performing that change to the configuration now would mean that Fluent Bit would start with a new database file. So any tracking information would be lost and all files would be processed again. Is there any suggestion on how I could tackle this issue without causing files to be reprocessed and duplicating information?

Your Environment

  • Version used: 3.0.4
  • Operating System and version: Windows
  • Filters and plugins: Tail

OS-tomaslaw avatar Jun 07 '24 18:06 OS-tomaslaw

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

github-actions[bot] avatar Sep 11 '24 01:09 github-actions[bot]

I think , this can be solved by soft delete

lchoudhu-tibco avatar Sep 14 '24 13:09 lchoudhu-tibco

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

github-actions[bot] avatar Dec 14 '24 02:12 github-actions[bot]

We're seeing this same issue when using the same offsets DB for multiple tail inputs. The simplest fix for us will be using a separate DB per input, but this behavior was a bit unexpected and the documentation isn't really clear on whether using the same DB for multiple inputs is supported or not.

briandefiant avatar Dec 16 '24 21:12 briandefiant

We're seeing this same issue when using the same offsets DB for multiple tail inputs. The simplest fix for us will be using a separate DB per input, but this behavior was a bit unexpected and the documentation isn't really clear on whether using the same DB for multiple inputs is supported or not.

yes, we got same issue when using multiple tail inputs. separate DB file pre input is working for my case.

wilalalee avatar Feb 11 '25 06:02 wilalalee

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

github-actions[bot] avatar May 13 '25 02:05 github-actions[bot]

This issue was closed because it has been stalled for 5 days with no activity.

github-actions[bot] avatar May 18 '25 02:05 github-actions[bot]

We are also experiencing this issue. Our testing confirms that when multiple tail inputs are configured to use a single, shared database, Fluent Bit reprocesses each input from the beginning after each restart, as if the offset were not tracked at all.

Fluent Bit logs indicate that on each start, it incorrectly treats inode records in the database as stale and removes them:

[2025/06/25 20:17:35] [ info] [input:tail:tail.0] db: delete unmonitored stale inodes from the database: count=1
[2025/06/25 20:17:35] [ info] [input:tail:tail.1] db: delete unmonitored stale inodes from the database: count=1
[2025/06/25 20:17:35] [ info] [input:tail:tail.2] db: delete unmonitored stale inodes from the database: count=1

tested on versions: 3.2.10-1 4.0.3-1

A working workaround is to configure each tail input with its own unique database file.

If fixing the underlying issue does not seem feasible, it would be great if the documentation could be updated to explicitly state this limitation.

aniro-s avatar Jun 25 '25 20:06 aniro-s