fluent-bit pod having difficulty connecting to Splunk HEC endpoint
Bug Report
Describe the bug We are attempting to add a splunk output to our fluent-bit pods that run as part of an EKS Amazon Cloudwatch addon. We are running into an issue where we are able to manually connect to the HEC endpoint via a curl command like so:
curl --request POST \
--url https://example.splunkcloud.com/services/collector \
--header 'Authorization: Splunk <hec-token>' \
--header 'Content-Type: application/json' \
--data '{"index": "airflow", "event": "from-fluent-bit-pod"}'
This produces the expected response:
{"text":"Success","code":0}
Similarly, querying the HEC health endpoint works:
curl --request GET \
--url https://example.splunkcloud.com/services/collector/health
This also produces the expected response:
{"text":"HEC is healthy","code":17}
but if we try it using the fluent-bit cli, or using a config file, then we get an error saying the domain is not found.
[net] getaddrinfo(host='https://example.splunkcloud.com/services/collector', err=4): Domain name not found
Here are some example commands of how I'm starting up fluent-bit:
/fluent-bit/bin/fluent-bit -i cpu -t cpu -o splunk -p host=https://example.splunkcloud.com/services/collector -p splunk_token=<token> \
-p tls=on -p tls.verify=off -m '*'
Expected behavior Since I can connect to the splunk ingestion endpoint using curl, I would expect fluent-bit to also be able to connect.
Your Environment
- Version used:
Fluent Bit v1.9.10
Git commit: f4996b8a8e6c82498e95906153738078039c74bd
- Environment name and version (e.g. Kubernetes? What version?): EKS Server Version: v1.28.12-eks-2f46c53, installed as part of the aws cloudwatch eks addon.
It would be particularly helpful if I could get some feedback on how to better diagnose what the issue is here. I work in a corporate environment, so there's always lots of networking/firewall issues to contend with, but I'm not sure how to get at the guts of what fluent-bit is running into (since my attempts at debugging it by posting events manually to splunk are all working).
1.9 is a very old version, can you retry with the latest version as there have been a lot of improvements and changes since?
Hey Patrick, thank you for the suggestion. Unfortunately we seem to be stuck in a slightly awkward position here because (at the moment) we're limited to the fluent-bit version that's shipped with the amazon-cloudwatch-observability eks addon. That's currently on 2.32.2, which ships the following:
2.32.1
This release includes:
Fluent Bit [1.9.10](https://github.com/fluent/fluent-bit/tree/v1.9.10)
Amazon CloudWatch Logs for Fluent Bit 1.9.4
Amazon Kinesis Streams for Fluent Bit 1.10.2
Amazon Kinesis Firehose for Fluent Bit 1.7.2
We're likely going to investigate adding our own fluent-bit pods in this case, but it would be nice if we could get some guidelines on debugging this issue with 1.9.10 in the meantime, if possible.
Thank you! Linus
I think you probably want to ask via the actual AWS repo for this then, there's an open issue on upgrading that too: https://github.com/aws/aws-for-fluent-bit/issues/494
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.
This issue was closed because it has been stalled for 5 days with no activity.