fluent-bit icon indicating copy to clipboard operation
fluent-bit copied to clipboard

fluent-bit pod having difficulty connecting to Splunk HEC endpoint

Open lifayt opened this issue 1 year ago • 4 comments

Bug Report

Describe the bug We are attempting to add a splunk output to our fluent-bit pods that run as part of an EKS Amazon Cloudwatch addon. We are running into an issue where we are able to manually connect to the HEC endpoint via a curl command like so:

curl --request POST \
  --url https://example.splunkcloud.com/services/collector \
  --header 'Authorization: Splunk <hec-token>' \
  --header 'Content-Type: application/json' \
  --data '{"index": "airflow", "event": "from-fluent-bit-pod"}'

This produces the expected response:

{"text":"Success","code":0}

Similarly, querying the HEC health endpoint works:

curl --request GET \
  --url https://example.splunkcloud.com/services/collector/health 

This also produces the expected response:

{"text":"HEC is healthy","code":17}

but if we try it using the fluent-bit cli, or using a config file, then we get an error saying the domain is not found.

[net] getaddrinfo(host='https://example.splunkcloud.com/services/collector', err=4): Domain name not found

Here are some example commands of how I'm starting up fluent-bit:

/fluent-bit/bin/fluent-bit -i cpu -t cpu -o splunk -p host=https://example.splunkcloud.com/services/collector -p splunk_token=<token> \
  -p tls=on -p tls.verify=off -m '*'

Expected behavior Since I can connect to the splunk ingestion endpoint using curl, I would expect fluent-bit to also be able to connect.

Your Environment

  • Version used:
Fluent Bit v1.9.10
Git commit: f4996b8a8e6c82498e95906153738078039c74bd
  • Environment name and version (e.g. Kubernetes? What version?): EKS Server Version: v1.28.12-eks-2f46c53, installed as part of the aws cloudwatch eks addon.

It would be particularly helpful if I could get some feedback on how to better diagnose what the issue is here. I work in a corporate environment, so there's always lots of networking/firewall issues to contend with, but I'm not sure how to get at the guts of what fluent-bit is running into (since my attempts at debugging it by posting events manually to splunk are all working).

lifayt avatar Sep 17 '24 20:09 lifayt

1.9 is a very old version, can you retry with the latest version as there have been a lot of improvements and changes since?

patrick-stephens avatar Sep 18 '24 14:09 patrick-stephens

Hey Patrick, thank you for the suggestion. Unfortunately we seem to be stuck in a slightly awkward position here because (at the moment) we're limited to the fluent-bit version that's shipped with the amazon-cloudwatch-observability eks addon. That's currently on 2.32.2, which ships the following:

2.32.1
This release includes:
Fluent Bit [1.9.10](https://github.com/fluent/fluent-bit/tree/v1.9.10)
Amazon CloudWatch Logs for Fluent Bit 1.9.4
Amazon Kinesis Streams for Fluent Bit 1.10.2
Amazon Kinesis Firehose for Fluent Bit 1.7.2

We're likely going to investigate adding our own fluent-bit pods in this case, but it would be nice if we could get some guidelines on debugging this issue with 1.9.10 in the meantime, if possible.

Thank you! Linus

lifayt avatar Sep 20 '24 18:09 lifayt

I think you probably want to ask via the actual AWS repo for this then, there's an open issue on upgrading that too: https://github.com/aws/aws-for-fluent-bit/issues/494

patrick-stephens avatar Sep 23 '24 08:09 patrick-stephens

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

github-actions[bot] avatar Dec 26 '24 02:12 github-actions[bot]

This issue was closed because it has been stalled for 5 days with no activity.

github-actions[bot] avatar Jan 01 '25 02:01 github-actions[bot]