fluent-bit icon indicating copy to clipboard operation
fluent-bit copied to clipboard

prometheus_scrape error decoding Prometheus Text format

Open fellipecmwbc opened this issue 3 years ago • 11 comments

Bug Report

Describe the bug

The prometheus_scraper is unable to parse valid Prometheus entries.

To Reproduce

  • Example log message:
[2022/06/06 23:42:25] [error] [input:prometheus_scrape:prometheus_scrape.0] error decoding Prometheus Text format
  • Steps to reproduce the problem:
    1. Deploy something that exposes Prometheus metrics with a fluent-bit sidecar and configs
    2. The scrapping gives the error message above

Sample output from the /metrics endpoint:

# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 101306368.0
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 25112576.0
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1654558876.4
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 0.38
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 6.0
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1048576.0
# HELP python_info Python platform information
# TYPE python_info gauge
python_info{implementation="CPython",major="3",minor="6",patchlevel="3",version="3.6.3"} 1.0
# HELP http_requests_total Total HTTP Requests (count)
# TYPE http_requests_total counter
http_requests_total{endpoint="/metrics",method="GET",status_code="200"} 2.0
http_requests_total{endpoint="/",method="GET",status_code="200"} 1.0
# HELP http_requests_inprogress Number of in progress HTTP requests
# TYPE http_requests_inprogress gauge
http_requests_inprogress 1.0
# HELP http_request_duration_seconds HTTP request latency (seconds)
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{le="0.005"} 2.0
http_request_duration_seconds_bucket{le="0.01"} 2.0
http_request_duration_seconds_bucket{le="0.025"} 2.0
http_request_duration_seconds_bucket{le="0.05"} 2.0
http_request_duration_seconds_bucket{le="0.075"} 2.0
http_request_duration_seconds_bucket{le="0.1"} 2.0
http_request_duration_seconds_bucket{le="0.25"} 2.0
http_request_duration_seconds_bucket{le="0.5"} 2.0
http_request_duration_seconds_bucket{le="0.75"} 2.0
http_request_duration_seconds_bucket{le="1.0"} 2.0
http_request_duration_seconds_bucket{le="2.5"} 2.0
http_request_duration_seconds_bucket{le="5.0"} 2.0
http_request_duration_seconds_bucket{le="7.5"} 2.0
http_request_duration_seconds_bucket{le="10.0"} 2.0
http_request_duration_seconds_bucket{le="+Inf"} 2.0
http_request_duration_seconds_count 2.0
http_request_duration_seconds_sum 0.0006913102697581053

Expected behavior The http://127.0.0.1:5000/metrics output parsed by the scraper

Your Environment

  • Version used: 1.9.4 and 1.9.3
  • Reference "app": https://github.com/philwinder/prometheus-python
  • Configuration: prometheus-configmap.yaml
apiVersion: v1
data:
  fluent-bit.conf: |
    [SERVICE]
        Flush         1
        Log_Level     trace
        Parsers_File  parsers_json.conf

    @INCLUDE input_*.conf
    [FILTER]
        Name modify
        Match *       
        Add environment some_env
    [OUTPUT]
        Name stdout
        Match *
  input_envoy.conf: |
    [INPUT]
        name prometheus_scrape
        host 127.0.0.1
        port 5000
        tag envoy
        metrics_path /metrics
        scrape_interval 30s
    [OUTPUT]
        name stdout
        match *
kind: ConfigMap
metadata:
  labels:
    app.kubernetes.io/name: fluent-bit
    app.kubernetes.io/part-of: logger
  name: pltf-fluent-bit-config

deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: pytest
  name: pytest
spec:
  replicas: 1
  selector:
    matchLabels:
      app: pytest
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: pytest
    spec:
      containers:
      - image: philwinder/prometheus-python
        name: prometheus-python
        ports:
        - containerPort: 5000
      - args:
        - -c
        - /fluent-bit/etc/fluent-bit.conf
        command:
        - /fluent-bit/bin/fluent-bit
        image: fluent/fluent-bit:1.9.3-debug
        imagePullPolicy: IfNotPresent
        name: logger
        resources:
          limits:
            cpu: 50m
            memory: 200Mi
          requests:
            cpu: 14m
            memory: 80Mi
        securityContext:
          allowPrivilegeEscalation: false
          privileged: false
          runAsUser: 900
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /fluent-bit/etc/fluent-bit.conf
          name: pltf-fluent-bit-envoy-config
          subPath: input_envoy.conf
      volumes:
      - configMap:
          defaultMode: 420
          items:
          - key: input_envoy.conf
            path: input_envoy.conf
          name: pltf-fluent-bit-config
        name: pltf-fluent-bit-envoy-config
  • Environment name and version: Azure Kubernetes Service v1.22.4
  • Server type and version: Linux
  • Operating System and version: N/A
  • Filters and plugins: as above

Additional context

The main goal is to use Prometheus Scrapper to capture Consul envoy sidecar metrics. We have used the example above as it is simpler and shows the same behavior.

fellipecmwbc avatar Jun 07 '22 01:06 fellipecmwbc

cc: @tarruda

edsiper avatar Jun 07 '22 03:06 edsiper

The issue was with cmetrics prometheus parser, fix pushed to https://github.com/calyptia/cmetrics/pull/123

tarruda avatar Jun 09 '22 16:06 tarruda

@tarruda This issue affects my team as well. Do you know what versions of fluent-bit and cmetrics will be getting the fix?

MrDrMcCoy avatar Jul 08 '22 16:07 MrDrMcCoy

@MrDrMcCoy the latest version of fluent-bit should have the fix. I'm not sure if cmetrics had a tag since the fix, @edsiper will know more

tarruda avatar Jul 08 '22 17:07 tarruda

@tarruda From what I am seeing, the issue persists with what I understand to be the current version. Here are the details:

Kubernetes version: AKS 1.23.5 Kube-state-metrics helm chart version: 4.10.0 Kube-state-metrics image: registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.5.0 Fluent-bit helm chart version: 0.20.3 Fluent-bit image: cr.fluentbit.io/fluent/fluent-bit:1.9.5 Fluent-bit config snippet:

[INPUT]
    tag kube_metrics.*
    name prometheus_scrape
    host kube-state-metrics.kube-state-metrics.svc.cluster.local
    port 8080
    metrics_path /metrics
    scrape_interval 15s

Fluent-bit console output:

[2022/07/08 18:49:17] [error] [input:prometheus_scrape:prometheus_scrape.0] error decoding Prometheus Text format
[2022/07/08 18:49:32] [error] [input:prometheus_scrape:prometheus_scrape.0] error decoding Prometheus Text format
[2022/07/08 18:49:47] [error] [input:prometheus_scrape:prometheus_scrape.0] error decoding Prometheus Text format
...

Sample of Prometheus endpoint output:

$ wget -qO- http://kube-state-metrics.kube-state-metrics.svc.cluster.local:8080/metrics | head -n 30
# HELP kube_certificatesigningrequest_annotations Kubernetes annotations converted to Prometheus labels.
# TYPE kube_certificatesigningrequest_annotations gauge
# HELP kube_certificatesigningrequest_labels Kubernetes labels converted to Prometheus labels.
# TYPE kube_certificatesigningrequest_labels gauge
# HELP kube_certificatesigningrequest_created Unix creation timestamp
# TYPE kube_certificatesigningrequest_created gauge
# HELP kube_certificatesigningrequest_condition The number of each certificatesigningrequest condition
# TYPE kube_certificatesigningrequest_condition gauge
# HELP kube_certificatesigningrequest_cert_length Length of the issued cert
# TYPE kube_certificatesigningrequest_cert_length gauge
# HELP kube_configmap_annotations Kubernetes annotations converted to Prometheus labels.
# TYPE kube_configmap_annotations gauge
kube_configmap_annotations{namespace="kube-system",configmap="kube-root-ca.crt"} 1
kube_configmap_annotations{namespace="kube-system",configmap="coredns-autoscaler"} 1
kube_configmap_annotations{namespace="kube-state-metrics",configmap="kube-root-ca.crt"} 1
kube_configmap_annotations{namespace="default",configmap="kube-root-ca.crt"} 1
kube_configmap_annotations{namespace="cert-manager",configmap="cert-manager-webhook"} 1
kube_configmap_annotations{namespace="ingress",configmap="ingress-controller-leader"} 1
kube_configmap_annotations{namespace="kube-system",configmap="coredns"} 1
kube_configmap_annotations{namespace="kube-system",configmap="azure-ip-masq-agent-config-reconciled"} 1
kube_configmap_annotations{namespace="prometheus",configmap="kube-root-ca.crt"} 1
kube_configmap_annotations{namespace="kube-system",configmap="overlay-upgrade-data"} 1
kube_configmap_annotations{namespace="kube-system",configmap="coredns-custom"} 1
kube_configmap_annotations{namespace="ingress",configmap="kube-root-ca.crt"} 1
kube_configmap_annotations{namespace="kube-system",configmap="extension-apiserver-authentication"} 1
kube_configmap_annotations{namespace="cert-manager",configmap="kube-root-ca.crt"} 1
kube_configmap_annotations{namespace="fluentbit",configmap="fluentbit-fluent-bit"} 1
kube_configmap_annotations{namespace="prometheus",configmap="prometheus-server"} 1
kube_configmap_annotations{namespace="ingress",configmap="nginx-ingress-ingress-nginx-controller"} 1
kube_configmap_annotations{namespace="kube-node-lease",configmap="kube-root-ca.crt"} 1

MrDrMcCoy avatar Jul 08 '22 19:07 MrDrMcCoy

Just updated to 1.9.6 and the issue persists exactly as before. Here is a complete dump of the metrics output by kube-state-metrics that Fluent-Bit is failing to parse: kube-state-metrics.log

If any more details would be useful, please let me know.

MrDrMcCoy avatar Jul 18 '22 18:07 MrDrMcCoy

@MrDrMcCoy I just tested your example and confirm that the parser doesn't handle your case because it contains empty metrics (HELP + TYPE sections without any samples). It is a different issue though, can you open a separate ticket?

tarruda avatar Jul 19 '22 18:07 tarruda

The issue was with cmetrics prometheus parser, fix pushed to calyptia/cmetrics#123

Hi @tarruda , thanks for that! I have tested with fluent-bit 1.9.6 but unfortunately I am having the same issue. I did not see empty metrics so I am attaching the full sample - consul-metrics.log, can you please have a look?

Please let me know if you need any additional information and thanks again!

fellipecmwbc avatar Jul 27 '22 07:07 fellipecmwbc

Hi @tarruda , did you get any chance to look into this update please? thanks and apologies for not sending the full sample before.

fellipecmwbc avatar Aug 10 '22 04:08 fellipecmwbc

@fellipecmwbc I had already pushed the fix to cmetrics, I think it was already merged into the latest fluent-bit version (@edsiper can confirm)

tarruda avatar Aug 10 '22 12:08 tarruda

@fellipecmwbc I had already pushed the fix to cmetrics, I think it was already merged into the latest fluent-bit version (@edsiper can confirm)

Just to confirm was is this a second fix? because I have tested the first fix and I still got an issue, hence this message, thanks!

fellipecmwbc avatar Aug 11 '22 06:08 fellipecmwbc

Hi, this is working in version 1.9.9, thanks!

fellipecmwbc avatar Oct 04 '22 21:10 fellipecmwbc