fluent-bit Multiline filter breaks pipeline

Bug Report

Describe the bug With the update from FluentBit 1.9.2 to >= 1.9.3, we have observed, that parts of our pipelines break. We have identified that there is an issue with the multiline filter. Unfortunately the patch #5564 (v1.9.5) does not seem to help here like with https://github.com/fluent/fluent-bit/issues/5524#issuecomment-1177641746

The observed behaviour is that already processed records are getting processed again from the beginning of the pipeline after they were merged together by the multiline filter. This leads to a totally different (and wrong) processing of records, as they are potentially getting modified twice, when they enter the pipeline as a multiline record once again.

I'm guessing this is related with various in_emitter changes (e.g.: https://github.com/fluent/fluent-bit/issues/4325#issuecomment-1127495679 cc @nokute78 )

To Reproduce fluent-bit.conf:

[SERVICE]
    Flush 1
    Daemon Off
    Log_Level debug
    Parsers_File /fluent-bit/parsers.conf

[INPUT]
    Name tail
    Read_from_Head true
    Path /fluent-bit/dummy-deployment-123_abc_dummy-xyz.log 
    Parser crio_json
    Tag_Regex (?<pod_name>[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*)_(?<namespace_name>[^_]+)_(?<container_name>.+)-
    Tag dummy.<namespace_name>.<pod_name>.<container_name>

[FILTER]
    name         lua
    match_regex  .*dummy.* 
    script       ./unpack.lua
    call         decapsulate
    alias        unpack

[FILTER]
    name                  multiline
    match                 *dummy*
    multiline.parser      multiline_dummy
    multiline.key_content msg

[OUTPUT]
    Name         stdout
    match        *

dummy-deployment-123_abc_dummy-xyz.log:

2021-12-20T13:09:57.041769095Z stdout F test: C  Wed Mar 17 00:24:57:085 2021
2021-12-20T13:09:57.041772065Z stdout F test: C  Begin of multiline message
2021-12-20T13:09:57.04177497Z stdout F test: C  database name=test
2021-12-20T13:09:57.041793201Z stdout F test: C  database host=test
2021-12-20T13:09:57.041797431Z stdout F test: C  database user=test
2021-12-20T13:09:57.041800225Z stdout F test: C  database port=test
2021-12-20T13:09:57.04180298Z stdout F test: C  abcdefghijklmnop
2021-12-20T13:09:57.041805732Z stdout F test: C  Test
2021-12-20T13:09:57.041808473Z stdout F test: C  end of multiline messag

parsers.conf:

[MULTILINE_PARSER]
    Name          multiline_dummy
    type          regex
    flush_timeout 1000
    rule          "start_state"  "^(?<msg>((\[Thr [0-9]{0,15}\])|([A-Z]{1}){0,1})[ ]{0,2}[A-Z][a-z][a-z] [A-Z][a-z][a-z] [0-3 ][0-9] [0-2][0-9]:[0-5][0-9]:[0-5][0-9](:[0-9]{3}){0,1} [0-9]{4}).*"       "cont"
    rule          "cont"         "^(?!(?<msg>((\[Thr [0-9]{0,15}\])|([A-Z]{1}){0,1})[ ]{0,2}[A-Z][a-z][a-z] [A-Z][a-z][a-z] [0-3 ][0-9] [0-2][0-9]:[0-5][0-9]:[0-5][0-9](:[0-9]{3}){0,1} [0-9]{4}).*)"   "cont"

[PARSER]
    Name crio_json
    Format Regex
    Regex /^(?<time>.+)\b(?<stream>stdout|stderr)(?<_p> [FP] )(?<log>.*)$/
    Time_Key    time
    Time_Format %Y-%m-%dT%H:%M:%S.%L
    Time_Keep   On
    Decode_Field_As   json      log

unpack.lua:

-- extract message from log
function decapsulate(tag, timestamp, record)
  local str = record["log"]
  if str ~= nil then
    print("str= "..str)
    -- Do anything with str here... and save it into msg field
    str = string.gsub(str, "%g+: ", "")
    record["msg"] = str
    record["log"] = nil
    return 2, timestamp, record
  else
    print("##############################################################################")
    print("We never should reach this!")
    return -1, timestamp, record
  end
end

Test it with fluentbit 1.9.2, 1.9.3 (and later versions), you will see that in >= 1.9.3 the multiline record will enter the unpack.lua a second time after it was combined in the multiline filter.

Expected behavior v1.9.3 and above should behave like v1.9.2 and below (or at least add this behaviour change to the documentation).

Aug 10 '22 10:08 drbugfinder-work

@drbugfinder-work

in_tail reads log and saves as a value of log.
filter_lua rename a key name log -> msg.
filter_multiline processes a value of msg and re-emits.
filter_lua processes a concatenated record again.
filter_multiline processes and re-emits , re-emits ...

Is the issue step4 and step5 since step3 doesn't rewrite a tag name ? filter_multiline re-emits using in_emitter plugin since v1.9.3 and it doesn't touch tag name. https://github.com/fluent/fluent-bit/pull/4671

@PettitWesley @edsiper We may need an option to rewrite a tag for filter_multiline to prevent infinite loop. It should take care stream not to rewrite same tag for all events.

Aug 11 '22 02:08 nokute78

Hi @nokute78,

when you execute the sample config, you will see, that there is no infinite loop (at least in this case), but there is step4 being executed one time for each multiline from step3. Before v1.9.3 this was not the case, where the processing ended after step3. I think in general this could lead into infinite loops.

I am aware that we could add a workaround for issue into our config and I am not saying that this is a severe bug - but it is definitely an unexpected behaviour change (which potentially leads into infinite loops for existing pipelines), that either should be fixed or documented correctly.

Your idea of adding an integrated rewrite tag option for the multiline filter sounds good to me.

Aug 11 '22 07:08 drbugfinder-work

I think adding an option to rewrite the tag makes sense.

Aug 12 '22 18:08 PettitWesley

I am also surprised by your use case @drbugfinder-work

When I fixed the filter and made it work using rewrite tag, I assumed that everyone would and could put the multiline filter as the first filter, since my thinking was that multiline logs that are split into multiple events are created by a source/input and the first thing you always want to do is recombine them, and then afterwards do any more processing. There'd be many cases where processing them first would make it impossible to recombine the split records.

Aug 12 '22 18:08 PettitWesley

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

Nov 11 '22 02:11 github-actions[bot]

This issue was closed because it has been stalled for 5 days with no activity.

Nov 17 '22 02:11 github-actions[bot]