stackdriver-prometheus-sidecar icon indicating copy to clipboard operation
stackdriver-prometheus-sidecar copied to clipboard

Unrecovereable error when remote writing NGINX Ingress metrics

Open bittermandel opened this issue 5 years ago • 2 comments

We have a basic installation of the sidecar together with the Prometheus Operator, and have configured the sidecar with the following flags:

args:
- --stackdriver.project-id=${PROJECT_ID}
- --prometheus.wal-directory=/prometheus/wal
- --stackdriver.kubernetes.location=${CLUSTER_REGION}
- --stackdriver.kubernetes.cluster-name=${CLUSTER_NAME}
- --include={__name__=~"nginx_.+"}

The metrics we are exporting are the default ones from https://github.com/kubernetes/ingress-nginx. This gives us the following error, about a few times per minute. This leads to no metrics being written to Stackdriver, while not giving a clear error.

Is there something we're missing when including metrics in that manner?

Thank you!

level=warn ts=2021-01-19T08:30:35.589Z caller=queue_manager.go:534 component=queue_manager msg="Unrecoverable error sending samples to remote storage" err="rpc error: co
de = InvalidArgument desc = Field timeSeries[36].points[0].distributionValue had an invalid value: Distribution |explicit_buckets.bounds| does not have at least one entr
y."
level=warn ts=2021-01-19T08:30:35.622Z caller=queue_manager.go:534 component=queue_manager msg="Unrecoverable error sending samples to remote storage" err="rpc error: co
de = InvalidArgument desc = Field timeSeries[7].points[0].distributionValue had an invalid value: Distribution |explicit_buckets.bounds| does not have at least one entry
."
level=warn ts=2021-01-19T08:30:35.651Z caller=queue_manager.go:534 component=queue_manager msg="Unrecoverable error sending samples to remote storage" err="rpc error: co
de = InvalidArgument desc = Field timeSeries[0].points[0].distributionValue had an invalid value: Distribution |explicit_buckets.bounds| does not have at least one entry
."

bittermandel avatar Jan 19 '21 08:01 bittermandel

I have a feeling it is caused by the bucket metrics, which have over 10 labels. Unsuccessful to filter them out using --include{__name__!~".+bucket", __name__=~"nginx_.+"}

bittermandel avatar Jan 19 '21 09:01 bittermandel

@bittermandel were you able to resolve this? I think I have a similar issue with the following args

    args:
    - "--stackdriver.project-id=<project>"
    - "--prometheus.wal-directory=/prometheus/wal"
    - "--stackdriver.kubernetes.location=us-east1-b"
    - "--stackdriver.kubernetes.cluster-name=<cluster name>"

jsirianni avatar Nov 12 '21 20:11 jsirianni