prometheus_scrape error decoding Prometheus Text format
Bug Report
Describe the bug
The prometheus_scraper is unable to parse valid Prometheus entries.
To Reproduce
- Example log message:
[2022/06/06 23:42:25] [error] [input:prometheus_scrape:prometheus_scrape.0] error decoding Prometheus Text format
- Steps to reproduce the problem:
- Deploy something that exposes Prometheus metrics with a fluent-bit sidecar and configs
- The scrapping gives the error message above
Sample output from the /metrics endpoint:
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 101306368.0
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 25112576.0
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1654558876.4
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 0.38
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 6.0
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1048576.0
# HELP python_info Python platform information
# TYPE python_info gauge
python_info{implementation="CPython",major="3",minor="6",patchlevel="3",version="3.6.3"} 1.0
# HELP http_requests_total Total HTTP Requests (count)
# TYPE http_requests_total counter
http_requests_total{endpoint="/metrics",method="GET",status_code="200"} 2.0
http_requests_total{endpoint="/",method="GET",status_code="200"} 1.0
# HELP http_requests_inprogress Number of in progress HTTP requests
# TYPE http_requests_inprogress gauge
http_requests_inprogress 1.0
# HELP http_request_duration_seconds HTTP request latency (seconds)
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{le="0.005"} 2.0
http_request_duration_seconds_bucket{le="0.01"} 2.0
http_request_duration_seconds_bucket{le="0.025"} 2.0
http_request_duration_seconds_bucket{le="0.05"} 2.0
http_request_duration_seconds_bucket{le="0.075"} 2.0
http_request_duration_seconds_bucket{le="0.1"} 2.0
http_request_duration_seconds_bucket{le="0.25"} 2.0
http_request_duration_seconds_bucket{le="0.5"} 2.0
http_request_duration_seconds_bucket{le="0.75"} 2.0
http_request_duration_seconds_bucket{le="1.0"} 2.0
http_request_duration_seconds_bucket{le="2.5"} 2.0
http_request_duration_seconds_bucket{le="5.0"} 2.0
http_request_duration_seconds_bucket{le="7.5"} 2.0
http_request_duration_seconds_bucket{le="10.0"} 2.0
http_request_duration_seconds_bucket{le="+Inf"} 2.0
http_request_duration_seconds_count 2.0
http_request_duration_seconds_sum 0.0006913102697581053
Expected behavior
The http://127.0.0.1:5000/metrics output parsed by the scraper
Your Environment
- Version used:
1.9.4and1.9.3 - Reference "app": https://github.com/philwinder/prometheus-python
- Configuration:
prometheus-configmap.yaml
apiVersion: v1
data:
fluent-bit.conf: |
[SERVICE]
Flush 1
Log_Level trace
Parsers_File parsers_json.conf
@INCLUDE input_*.conf
[FILTER]
Name modify
Match *
Add environment some_env
[OUTPUT]
Name stdout
Match *
input_envoy.conf: |
[INPUT]
name prometheus_scrape
host 127.0.0.1
port 5000
tag envoy
metrics_path /metrics
scrape_interval 30s
[OUTPUT]
name stdout
match *
kind: ConfigMap
metadata:
labels:
app.kubernetes.io/name: fluent-bit
app.kubernetes.io/part-of: logger
name: pltf-fluent-bit-config
deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: pytest
name: pytest
spec:
replicas: 1
selector:
matchLabels:
app: pytest
template:
metadata:
creationTimestamp: null
labels:
app: pytest
spec:
containers:
- image: philwinder/prometheus-python
name: prometheus-python
ports:
- containerPort: 5000
- args:
- -c
- /fluent-bit/etc/fluent-bit.conf
command:
- /fluent-bit/bin/fluent-bit
image: fluent/fluent-bit:1.9.3-debug
imagePullPolicy: IfNotPresent
name: logger
resources:
limits:
cpu: 50m
memory: 200Mi
requests:
cpu: 14m
memory: 80Mi
securityContext:
allowPrivilegeEscalation: false
privileged: false
runAsUser: 900
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /fluent-bit/etc/fluent-bit.conf
name: pltf-fluent-bit-envoy-config
subPath: input_envoy.conf
volumes:
- configMap:
defaultMode: 420
items:
- key: input_envoy.conf
path: input_envoy.conf
name: pltf-fluent-bit-config
name: pltf-fluent-bit-envoy-config
- Environment name and version: Azure Kubernetes Service
v1.22.4 - Server type and version: Linux
- Operating System and version: N/A
- Filters and plugins: as above
Additional context
The main goal is to use Prometheus Scrapper to capture Consul envoy sidecar metrics. We have used the example above as it is simpler and shows the same behavior.
cc: @tarruda
The issue was with cmetrics prometheus parser, fix pushed to https://github.com/calyptia/cmetrics/pull/123
@tarruda This issue affects my team as well. Do you know what versions of fluent-bit and cmetrics will be getting the fix?
@MrDrMcCoy the latest version of fluent-bit should have the fix. I'm not sure if cmetrics had a tag since the fix, @edsiper will know more
@tarruda From what I am seeing, the issue persists with what I understand to be the current version. Here are the details:
Kubernetes version: AKS 1.23.5
Kube-state-metrics helm chart version: 4.10.0
Kube-state-metrics image: registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.5.0
Fluent-bit helm chart version: 0.20.3
Fluent-bit image: cr.fluentbit.io/fluent/fluent-bit:1.9.5
Fluent-bit config snippet:
[INPUT]
tag kube_metrics.*
name prometheus_scrape
host kube-state-metrics.kube-state-metrics.svc.cluster.local
port 8080
metrics_path /metrics
scrape_interval 15s
Fluent-bit console output:
[2022/07/08 18:49:17] [error] [input:prometheus_scrape:prometheus_scrape.0] error decoding Prometheus Text format
[2022/07/08 18:49:32] [error] [input:prometheus_scrape:prometheus_scrape.0] error decoding Prometheus Text format
[2022/07/08 18:49:47] [error] [input:prometheus_scrape:prometheus_scrape.0] error decoding Prometheus Text format
...
Sample of Prometheus endpoint output:
$ wget -qO- http://kube-state-metrics.kube-state-metrics.svc.cluster.local:8080/metrics | head -n 30
# HELP kube_certificatesigningrequest_annotations Kubernetes annotations converted to Prometheus labels.
# TYPE kube_certificatesigningrequest_annotations gauge
# HELP kube_certificatesigningrequest_labels Kubernetes labels converted to Prometheus labels.
# TYPE kube_certificatesigningrequest_labels gauge
# HELP kube_certificatesigningrequest_created Unix creation timestamp
# TYPE kube_certificatesigningrequest_created gauge
# HELP kube_certificatesigningrequest_condition The number of each certificatesigningrequest condition
# TYPE kube_certificatesigningrequest_condition gauge
# HELP kube_certificatesigningrequest_cert_length Length of the issued cert
# TYPE kube_certificatesigningrequest_cert_length gauge
# HELP kube_configmap_annotations Kubernetes annotations converted to Prometheus labels.
# TYPE kube_configmap_annotations gauge
kube_configmap_annotations{namespace="kube-system",configmap="kube-root-ca.crt"} 1
kube_configmap_annotations{namespace="kube-system",configmap="coredns-autoscaler"} 1
kube_configmap_annotations{namespace="kube-state-metrics",configmap="kube-root-ca.crt"} 1
kube_configmap_annotations{namespace="default",configmap="kube-root-ca.crt"} 1
kube_configmap_annotations{namespace="cert-manager",configmap="cert-manager-webhook"} 1
kube_configmap_annotations{namespace="ingress",configmap="ingress-controller-leader"} 1
kube_configmap_annotations{namespace="kube-system",configmap="coredns"} 1
kube_configmap_annotations{namespace="kube-system",configmap="azure-ip-masq-agent-config-reconciled"} 1
kube_configmap_annotations{namespace="prometheus",configmap="kube-root-ca.crt"} 1
kube_configmap_annotations{namespace="kube-system",configmap="overlay-upgrade-data"} 1
kube_configmap_annotations{namespace="kube-system",configmap="coredns-custom"} 1
kube_configmap_annotations{namespace="ingress",configmap="kube-root-ca.crt"} 1
kube_configmap_annotations{namespace="kube-system",configmap="extension-apiserver-authentication"} 1
kube_configmap_annotations{namespace="cert-manager",configmap="kube-root-ca.crt"} 1
kube_configmap_annotations{namespace="fluentbit",configmap="fluentbit-fluent-bit"} 1
kube_configmap_annotations{namespace="prometheus",configmap="prometheus-server"} 1
kube_configmap_annotations{namespace="ingress",configmap="nginx-ingress-ingress-nginx-controller"} 1
kube_configmap_annotations{namespace="kube-node-lease",configmap="kube-root-ca.crt"} 1
Just updated to 1.9.6 and the issue persists exactly as before. Here is a complete dump of the metrics output by kube-state-metrics that Fluent-Bit is failing to parse: kube-state-metrics.log
If any more details would be useful, please let me know.
@MrDrMcCoy I just tested your example and confirm that the parser doesn't handle your case because it contains empty metrics (HELP + TYPE sections without any samples). It is a different issue though, can you open a separate ticket?
The issue was with cmetrics prometheus parser, fix pushed to calyptia/cmetrics#123
Hi @tarruda , thanks for that! I have tested with fluent-bit 1.9.6 but unfortunately I am having the same issue. I did not see empty metrics so I am attaching the full sample - consul-metrics.log, can you please have a look?
Please let me know if you need any additional information and thanks again!
Hi @tarruda , did you get any chance to look into this update please? thanks and apologies for not sending the full sample before.
@fellipecmwbc I had already pushed the fix to cmetrics, I think it was already merged into the latest fluent-bit version (@edsiper can confirm)
@fellipecmwbc I had already pushed the fix to cmetrics, I think it was already merged into the latest fluent-bit version (@edsiper can confirm)
Just to confirm was is this a second fix? because I have tested the first fix and I still got an issue, hence this message, thanks!
Hi, this is working in version 1.9.9, thanks!