Duplicated timeseries in CollectorRegistry with Multiprocess Gunicorn
I know this is a subject that comes up somewhat frequently but for the love of me I can't figure out what I'm doing wrong.
-
I have a service in Amazon ECS thats running a single task with multiple workers (actually the problem happens in my other service that just has one worker also).
-
I've created the directory and set the
PROMETHEUS_MULTIPROC_DIRin the Dockerfile:
RUN mkdir -p /tmp/prom-metrics
ENV PROMETHEUS_MULTIPROC_DIR /tmp/prom-metrics
- I'm using the sample code in the README to create the registry in the
/metricsrequest and return it:
registry = CollectorRegistry()
if getenv('PROMETHEUS_MULTIPROC_DIR'):
multiprocess.MultiProcessCollector(registry)
data = generate_latest(registry)
status = '200 OK'
response_headers = [
('Content-type', CONTENT_TYPE_LATEST),
('Content-Length', str(len(data))),
]
return Response(data, status, response_headers)
- I've created the
gunicorn.conf.pyfile with the sample from the README and passed it into my gunicorn startup script via-c:
from prometheus_client import multiprocess
def child_exit(server, worker):
multiprocess.mark_process_dead(worker.pid)
In my two services, gunicorn starts them as follows:
# app 1 with workers
gunicorn -c /app/utils/gunicorn.conf.py -b :5000 -t 3600 --keep-alive 60 --threads 8 --workers 3 app:app
# app 2 without workers
gunicorn -c /app/utils/gunicorn.conf.py -b :5000 -t 3600 --keep-alive 60 --threads 8 app:app
The service boots successfully and accepts some metrics which are definitely collected in multiprocess mode, seeing as the HELP line simply displays Multiprocess metric.
This works for a few calls but eventually I get the dreaded Duplicated timeseries in CollectorRegistry error and no additional metrics are populated.
What might I be doing wrong?
Interesting. Do you happen to be registering the metrics with registry=registry when creating the metrics? I haven't seen this myself, if you or anyone else has a small code demo that reproduces the issue that would be great.
No, I did try to create the registry in the global namespace and pass it into the metrics via registry=registry, but this resulted in duplicate metrics (one Multiprocess, and one regular).
I will attempt to create a minimally reproducible example.