fluent-bit icon indicating copy to clipboard operation
fluent-bit copied to clipboard

HTTP output to Betterstack crashes with a BusError on FreeBSD 14.2 amd64

Open arsatiki opened this issue 1 year ago • 11 comments

Bug Report

Describe the bug

My goal is to send logs from my FreeBSD host and any jails running on it to Betterstack. When I enable the HTTP output plugin, fluent-bit crashes with a Bus Error when it tries to send the message forward.

To Reproduce

  • Steps to reproduce the problem:

The configuration file looks like this:

[SERVICE]
    flush        1
    log_level    info
    parsers_file parsers.conf
    plugins_file plugins.conf
    http_server  Off
    http_listen  0.0.0.0
    http_port    2020
    storage.metrics on

[INPUT]
    tag  syslog
    name tail
    path /var/log/messages

[INPUT]
    tag siansaksa
    name random

[OUTPUT]
    match *
    name stdout
    format json_lines
        
[OUTPUT]
    name    http
    match   *
    tls     On
    host    in.logs.betterstack.com
    port    443
    uri     /fluentbit
    header  Authorization Bearer XXXXXX # Token omitted for privacy
    header  Content-Type application/msgpack
    format  msgpack
    retry_limit 5

I execute Fluent Bit with doas -u nobody /usr/local/bin/fluent-bit -c /usr/local/etc/fluent-bit/fluent-bit.conf.

The execution crashes with a Bus Error after the first random entry is generated:

[2024/12/13 12:18:45] [ info] [config] changing coro_stack_size from 3072 to 4096 bytes
Fluent Bit v3.2.2
* Copyright (C) 2015-2024 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io/

______ _                  _    ______ _ _           _____  _____ 
|  ___| |                | |   | ___ (_) |         |____ |/ __  \
| |_  | |_   _  ___ _ __ | |_  | |_/ /_| |_  __   __   / /`' / /'
|  _| | | | | |/ _ \ '_ \| __| | ___ \ | __| \ \ / /   \ \  / /  
| |   | | |_| |  __/ | | | |_  | |_/ / | |_   \ V /.___/ /./ /___
\_|   |_|\__,_|\___|_| |_|\__| \____/|_|\__|   \_/ \____(_)_____/


[2024/12/13 12:18:45] [ info] [fluent bit] version=3.2.2, commit=, pid=23342
[2024/12/13 12:18:45] [ info] [storage] ver=1.5.2, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2024/12/13 12:18:45] [ info] [simd    ] disabled
[2024/12/13 12:18:45] [ info] [cmetrics] version=0.9.9
[2024/12/13 12:18:45] [ info] [ctraces ] version=0.5.7
[2024/12/13 12:18:45] [ info] [input:tail:tail.0] initializing
[2024/12/13 12:18:45] [ info] [input:tail:tail.0] storage_strategy='memory' (memory only)
[2024/12/13 12:18:45] [ info] [input:random:random.1] initializing
[2024/12/13 12:18:45] [ info] [input:random:random.1] storage_strategy='memory' (memory only)
[2024/12/13 12:18:45] [ info] [output:stdout:stdout.0] worker #0 started
[2024/12/13 12:18:45] [ info] [sp] stream processor started
[2024/12/13 12:18:45] [ info] [output:http:http.1] worker #0 started
[2024/12/13 12:18:45] [ info] [output:http:http.1] worker #1 started
{"date":1734085126.234562,"rand_value":6488732564125523264}
Bus error

Expected behavior

  • Fluent will print out random log entries on console
  • Same entries are visible in Betterstack

Your Environment

  • Version used: 3.1.9 and 3.2.2
  • Configuration: See above
  • Operating System and version: FreeBSD 14.2 on amd64

Additional context

Since the shipper doesn't work for me, I've been forced to install Fluentd and it makes me unhappy.

arsatiki avatar Dec 13 '24 11:12 arsatiki

Some wrangling with the core dump says that the problem is in the ares__slist_node_first function. The address for head seems to be rax = 0xd234bc1b34275b44, which should be aligned with 8 on amd64.

arsatiki avatar Dec 13 '24 12:12 arsatiki

We managed to get it working if there's only one active source and only one active output, ranom values in this case. Once I write something to the syslog, the next random entry brings down the process again.

arsatiki avatar Dec 15 '24 12:12 arsatiki

Mitigated the problem by making the coro stack 80k

arsatiki avatar Dec 15 '24 19:12 arsatiki

Was going to say, we don't technically support it directly as a platform: https://docs.fluentbit.io/manual/installation/supported-platforms

I was assuming you were compiling it directly so we would need a lot more information about how/what you configured to do that but sounds like you sorted.

patrick-stephens avatar Dec 16 '24 10:12 patrick-stephens

Initially I got it from the package system, i.e. as a prebuilt binary. See https://www.freshports.org/sysutils/fluent-bit/ for example. I did my own build for debugging purposes to get debugging symbols and address sanitizer. That build also used the setup from the ports system, so both cases built it the same way.

arsatiki avatar Dec 16 '24 10:12 arsatiki

That build is unrelated to this project so we cannot support it.

Did you get it going then with the coro stack size change?

patrick-stephens avatar Dec 16 '24 10:12 patrick-stephens

It's been running 15 hours now without crashing, keeping my fingers crossed :D

arsatiki avatar Dec 16 '24 11:12 arsatiki

It might be worth adding to the general Raspbian builds then here: https://github.com/fluent/fluent-bit/blob/4d715c07d91ae8087a2bf1e6a185b3e95ac18914/packaging/distros/raspbian/Dockerfile#L63-L74

patrick-stephens avatar Dec 16 '24 13:12 patrick-stephens

Hi!

I'm the "porter" for fluent-bit to FreeBSD. I guess, since you don't support the platform, I'm trying to do that for you. FreeBSD users are kind of used to this scenario. No problem.

I tried switching clang for gcc just to rule out problems related to clang, and the error persists with gcc as well.

Mitigated the problem by making the coro stack 80k

How do you do that. Thorugh configuration or in the build?

girgen avatar Dec 22 '24 10:12 girgen

Configuration

arsatiki avatar Dec 22 '24 10:12 arsatiki

So, perhaps We should just add a higher hard coded value for the coro stack? Check this code:

https://github.com/fluent/fluent-bit/blob/d77c06dc6cae0eefe534435eee0fb024e3dc1021/include/fluent-bit/flb_coro.h

#ifdef FLB_SYSTEM_MACOS
#ifdef __aarch64__
#define STACK_FACTOR 1.5 /* Use 36KiB for coro stacks */
#else
#define STACK_FACTOR 2   /* Use 24KiB for coro stacks */
#endif
#else
#define STACK_FACTOR 1
#endif

#ifdef FLB_CORO_STACK_SIZE
#define FLB_CORO_STACK_SIZE_BYTE      FLB_CORO_STACK_SIZE
#else
#define FLB_CORO_STACK_SIZE_BYTE      ((3 * STACK_FACTOR * PTHREAD_STACK_MIN) / 2)
#endif

Would it make sense adding something similar for BSD + amd64?

girgen avatar Dec 28 '24 11:12 girgen

ping @arsatiki do you agree bumping the coro stack as per ☝️ ?

girgen avatar Jan 06 '25 21:01 girgen

I've hard coded to the expected valiue according to some documentaion printouts. See: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=283299

The PMAX_STACK_MIN = 4 * 512 in FreeBSD's include/pthread.h so it seems way too low in comparison with the 24576 mentioned in the docs (https://docs.fluentbit.io/manual/administration/configuring-fluent-bit).

Would this be OK?

--- include/fluent-bit/flb_coro.h.orig	2024-12-30 22:32:11.000000000 +0100
+++ include/fluent-bit/flb_coro.h	2025-01-06 23:50:52.035541000 +0100
@@ -68,7 +68,11 @@
 #define STACK_FACTOR 2   /* Use 24KiB for coro stacks */
 #endif
 #else
+#ifdef FLB_SYSTEM_FREEBSD
+#define FLB_CORO_STACK_SIZE 24576 /* FreeBSD's PTHREAD_STACK_MIN is just 2048 */
+#else
 #define STACK_FACTOR 1
+#endif
 #endif
 
 #ifdef FLB_CORO_STACK_SIZE

girgen avatar Jan 06 '25 23:01 girgen

ping @arsatiki do you agree bumping the coro stack as per ☝️ ?

Oops sorry, completely missed your original question because of Christmas 😅 That sounds like a reasonable approach to me. I originally used 20k as the stack size, but that wasn't enough. I then used 80k as mentioned above, but it seems 24576 works too. I'll let you know if it does crash later though 😄

arsatiki avatar Jan 07 '25 05:01 arsatiki

@girgen It's still running so let's say 24576 is okay.

arsatiki avatar Jan 07 '25 11:01 arsatiki

Excellent. @patrick-stephens is there a point in me making a pull request?

Would you consider recognising the FreeBSD port as some sort of "community supported distribution"? See https://github.com/freebsd/freebsd-ports/tree/main/sysutils/fluent-bit

I added the above patch there in https://github.com/freebsd/freebsd-ports/blob/main/sysutils/fluent-bit/files/patch-include__flb_coro.h and it is used already as per https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=283299

girgen avatar Jan 07 '25 11:01 girgen

I mean it would be great to submit a patch to ensure it works here but you can also point people at the BSD side maybe from the docs? Maybe from https://github.com/fluent/fluent-bit-docs/blob/master/installation/supported-platforms.md?

Even though we do not officially build for FreeBSD there's no issue with having a patch to support it (as long as it does not break any other targets). Probably simplifies your downstream usage then as well.

patrick-stephens avatar Jan 07 '25 13:01 patrick-stephens