fluent-bit icon indicating copy to clipboard operation
fluent-bit copied to clipboard

Fluent-bit v1.8.15 and v1.9.3 output azure on Windows fails to connect to Log Analytics

Open desek opened this issue 3 years ago • 5 comments

Bug Report

Describe the bug When running fluent-bit 1.8.15 or 1.9.3 on Windows containers (Windows Server 2019 Datacenter 10.0.17763.2686 containerd://1.6.1) with Kubernetes 1.22.8 the output plugin azure output connection error and fails to sent data to Log Analytics.

The same error appears both with servercore and nanoserver images as both ContainerUser and ContainerAdministrator user.

To Reproduce

  • Example log message:
[2022/04/28 15:16:25] [debug] [input chunk] update output instances with new chunk size diff=1027
[2022/04/28 15:16:25] [debug] [task] created task=0000029E74F3BF80 id=0 OK
[2022/04/28 15:16:25] [error] [tls] error: unexpected EOF
[2022/04/28 15:16:25] [debug] [upstream] connection #1116 failed to 3863cb67-6c46-4780-854d-5737842a4d18.ods.opinsights.azure.com:443
[2022/04/28 15:16:25] [debug] [out flush] cb_destroy coro_id=0
[2022/04/28 15:16:25] [debug] [retry] new retry created for task_id=0 attempts=1
[2022/04/28 15:16:25] [ warn] [engine] failed to flush chunk '5548-1651158984.511061600.flb', retry in 6 seconds: task_id=0, input=tail.0 > output=azure.0 (out_id=0)
[2022/04/28 15:16:26] [debug] [input chunk] update output instances with new chunk size diff=1027
[2022/04/28 15:16:26] [debug] [task] created task=0000029E74F3B180 id=1 OK
[2022/04/28 15:16:26] [error] [tls] error: unexpected EOF
[2022/04/28 15:16:26] [debug] [upstream] connection #1152 failed to 3863cb67-6c46-4780-854d-5737842a4d18.ods.opinsights.azure.com:443
[2022/04/28 15:16:26] [debug] [out flush] cb_destroy coro_id=1
[2022/04/28 15:16:26] [debug] [retry] new retry created for task_id=1 attempts=1
[2022/04/28 15:16:26] [ warn] [engine] failed to flush chunk '5548-1651158986.5265500.flb', retry in 9 seconds: task_id=1, input=tail.0 > output=azure.0 (out_id=0)
[2022/04/28 15:16:27] [debug] [input chunk] update output instances with new chunk size diff=1027
[2022/04/28 15:16:27] [debug] [task] created task=0000029E74F3BD00 id=2 OK
[2022/04/28 15:16:27] [error] [tls] error: unexpected EOF
  • Steps to reproduce the problem:
  1. Start a plain windows server core or nanoserver container
  2. Download and install fluent-bit from zip-file
  3. Run fluent-bit with configuration (see below)
  4. Error appears

Expected behavior

Output plugin should successfully send data to Log Analytics.

Your Environment

  • Version used: 1.8.15 and 1.9.3
  • Configuration:
  • Environment name and version (e.g. Kubernetes? What version?):
    • Kubernetes 1.22.8
    • Windows Server 2019 Datacenter 10.0.17763.2686 containerd://1.6.1
  • Server type and version: N/A
  • Operating System and version:
    • mcr.microsoft.com/windows/nanoserver:1809 runtime container
    • mcr.microsoft.com/windows/servercore:1809 runtime container
  • Filters and plugins:
    • Input: tail
    • Filter: kubernetes
    • Output: azure
    • Parser: cri

Additional context Config:

    [SERVICE]
        Flush         1
        Log_Level     trace
        Daemon        off
        Parsers_File  parsers.conf
        HTTP_Server   On
        HTTP_Listen   0.0.0.0
        HTTP_Port     2020
    [INPUT]
        Name              tail
        Tag               kube.*
        Path              C:\\var\\log\\containers\\fluent-bit*.log
        Parser            cri
        DB                C:\\var\\flb\\tail_cri.db
        Mem_Buf_Limit     5MB
        Skip_Long_Lines   On
        Refresh_Interval  10
    [FILTER]
        Name                kubernetes
        Match               kube.*
        Kube_URL            https://kubernetes.default.svc.cluster.local:443
        Kube_CA_File        C:\\var\\run\\secrets\\kubernetes.io\\serviceaccount\\ca.crt
        Kube_Token_File     C:\\var\\run\\secrets\\kubernetes.io\\serviceaccount\\token
        Kube_Tag_Prefix     kube.C.var.log.containers.
        Merge_Log           On
        Merge_Log_Key       log_processed
        K8S-Logging.Parser  On
        K8S-Logging.Exclude Off
    [OUTPUT]
        Name        azure
        Match       *
        tls         on
        tls.debug   4
        Customer_ID 3863cb67-6c46-4780-854d-5737842a4d18
        Shared_Key  <redacted>
    [PARSER]
        # http://rubular.com/r/tjUt3Awgg4
        Name cri
        Format regex
        Regex ^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<logtag>[^ ]*) (?<message>.*)$
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L%z

desek avatar Apr 28 '22 15:04 desek

I am seeing similar issue when trying to use the cloudwatch_logs output in a Windows-based FluentBit container on Kubernetes. I see the same [tls] error: unexpected EOF when trying to connect to AWS STS and CloudWatch.

I suspect something is going wrong when trying to negotiate TLS in Windows containers.

FluentBit Version: 1.9.1 Windows OS (K8s node): Server 2019 Image base: mcr.microsoft.com/windows/servercore:ltsc2019

bryangardner avatar May 03 '22 04:05 bryangardner

As a workaround I configured the output with tls.verify Off. Not optimal, but it gets the job done for now.

desek avatar May 03 '22 06:05 desek

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

github-actions[bot] avatar Aug 02 '22 02:08 github-actions[bot]

Please remove stale

bryangardner avatar Aug 04 '22 12:08 bryangardner

As a workaround I configured the output with tls.verify Off. Not optimal, but it gets the job done for now.

@desek and @bryangardner I am facing a similar issue on a windows node in EKS cluster. Fluenentbit logs are similar to https://github.com/fluent/fluent-bit/issues/4727

I tried 'tls.verify Off' in output but errors persist. Any suggestion to workaround this?

{"log":"[2022/08/11 07:18:33] [ info] [filter:kubernetes:kubernetes.0] testing connectivity with Kubelet...\r\n","stream":"stderr","time":"2022-08-11T07:18:33.5290575Z"} {"log":"[2022/08/11 07:18:33] [debug] [filter:kubernetes:kubernetes.0] Send out request to Kubelet for pods information.\r\n","stream":"stderr","time":"2022-08-11T07:18:33.5296845Z"} {"log":"[2022/08/11 07:18:34] [error] [tls] C:\src\src\tls\mbedtls.c:390 NET - Sending information through the socket failed\r\n","stream":"stderr","time":"2022-08-11T07:18:34.5385134Z"} {"log":"[2022/08/11 07:18:34] [debug] [upstream] connection #792 failed to 127.0.0.1:10250\r\n","stream":"stderr","time":"2022-08-11T07:18:34.5385134Z"} {"log":"[2022/08/11 07:18:34] [error] [filter:kubernetes:kubernetes.0] kubelet upstream connection error\r\n","stream":"stderr","time":"2022-08-11T07:18:34.5385134Z"} {"log":"[2022/08/11 07:18:34] [ warn] [filter:kubernetes:kubernetes.0] could not get meta for POD fluent-bit-windows-92pjh\r\n","stream":"stderr","time":"2022-08-11T07:18:34.5385134Z"}

xulfiqar1 avatar Aug 11 '22 09:08 xulfiqar1

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

github-actions[bot] avatar Nov 10 '22 02:11 github-actions[bot]

This issue was closed because it has been stalled for 5 days with no activity.

github-actions[bot] avatar Nov 16 '22 02:11 github-actions[bot]

Please re-open, I am seeing the same issue when using 2.0.5 on Windows 2022 nodes in AKS Disabling the tls.verify works and logs are pushed to LAW.

Kubernetes version: 1.24.6 Node image: AKSWindows-2022-containerd-20348.1131.221019 image: ghcr.io/fluent/fluent-bit/staging:windows-2022-2.0.5

thebridge90 avatar Nov 16 '22 15:11 thebridge90