client_python RuntimeError: Read beyond file size detected, /home/git/counter

I have django-rest project that runs with gunicorn. After working a while with my apis it throw this error:

 Traceback (most recent call last):
  File "/home/git/app/test/./manage.py", line 21, in <module>
    main()
  File "/home/git/app/test/./manage.py", line 17, in main
    execute_from_command_line(sys.argv)
  File "/usr/local/lib/python3.9/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line
    utility.execute()
  File "/usr/local/lib/python3.9/site-packages/django/core/management/__init__.py", line 357, in execute
    django.setup()
  File "/usr/local/lib/python3.9/site-packages/django/__init__.py", line 24, in setup
    apps.populate(settings.INSTALLED_APPS)
  File "/usr/local/lib/python3.9/site-packages/django/apps/registry.py", line 91, in populate
    app_config = AppConfig.create(entry)
  File "/usr/local/lib/python3.9/site-packages/django/apps/config.py", line 90, in create
    module = import_module(entry)
  File "/usr/local/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 790, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "/usr/local/lib/python3.9/site-packages/django_prometheus/__init__.py", line 14, in <module>
    import django_prometheus.middleware
  File "/usr/local/lib/python3.9/site-packages/django_prometheus/middleware.py", line 71, in <module>
    http_streaming_responses = Counter(
  File "/usr/local/lib/python3.9/site-packages/prometheus_client/metrics.py", line 98, in __init__
    self._metric_init()
  File "/usr/local/lib/python3.9/site-packages/prometheus_client/metrics.py", line 232, in _metric_init
    self._value = values.ValueClass(self._type, self._name, self._name + '_total', self._labelnames, self._labelvalues)
  File "/usr/local/lib/python3.9/site-packages/prometheus_client/values.py", line 49, in __init__
    self.__reset()
  File "/usr/local/lib/python3.9/site-packages/prometheus_client/values.py", line 63, in __reset
    files[file_prefix] = MmapedDict(filename)
  File "/usr/local/lib/python3.9/site-packages/prometheus_client/mmap_dict.py", line 53, in __init__
    for key, _, pos in self._read_all_values():
  File "/usr/local/lib/python3.9/site-packages/prometheus_client/mmap_dict.py", line 89, in _read_all_values
    raise RuntimeError(msg % self._fname)
RuntimeError: Read beyond file size detected, /home/git/counter_1000.db is corrupted.

This happened after a while of when run a celery task from ./manage.py shell. I handle my project with docker and this is my docker-compose.yml:

version: '3.7'

services:
  test_db:
    container_name: test_db
    image: postgres:13.1
    restart: unless-stopped
    environment:
      POSTGRES_DB: "test"
    volumes:
      - type: volume
        source: postgres_db
        target: /var/lib/postgresql/data
    env_file: production.env

  test_web:
    container_name: test_web
    build:
      context: .
      target: django
    restart: unless-stopped
    depends_on:
      - test_db
    volumes:
      - .:/home/git/app
    ports:
      - "127.0.0.1:9000:8080"
    command: ["gunicorn", "--reload", "--timeout", "10", "--max-requests", "2000", "--worker-tmp-dir", "/dev/shm", "--workers=8", "--bind=0.0.0.0:8080", "--chdir", "./test", "test.wsgi"]
    cap_drop:
      - ALL
    env_file: production.env

  test_cel_lp:
    container_name: test_cel_lp
    build:
      context: .
      target: django
    working_dir: /home/git/app/test
    command: ["celery", "-A", "test", "worker", "-l", "info", "-Q", "low_priority", "--autoscale", "2,1"]
    restart: unless-stopped
    volumes:
      - .:/home/git/app
    depends_on:
      - test_db
      - test_redis
    env_file: production.env

  test_cel_d:
    container_name: test_cel_d
    build:
      context: .
      target: django
    working_dir: /home/git/app/test
    command: ["celery", "-A", "test", "worker", "-l", "info", "-Q", "default", "--autoscale", "4,2"]
    restart: unless-stopped
    volumes:
      - .:/home/git/app
    depends_on:
      - test_db
      - test_redis
    env_file: production.env

  test_cel_hp:
    container_name: test_cel_hp
    build:
      context: .
      target: django
    working_dir: /home/git/app/test
    command: ["celery", "-A", "test", "worker", "-l", "info", "-Q", "high_priority", "--autoscale", "8,4"]
    restart: unless-stopped
    volumes:
      - .:/home/git/app

    depends_on:
      - test_db
      - test_redis
    env_file: production.env

  test_cel_beat:
    container_name: test_cel_beat
    build:
      context: .
      target: django
    working_dir: /home/git/app/test
    command: ["celery", "-A", "test", "beat", "-l", "info", "-s", "/home/git/db/scheduler.db", "--pidfile=/tmp/beat.pid"]
    restart: unless-stopped
    volumes:
      - .:/home/git/app
      - type: volume
        source: beat_scheduler
        target: /home/git/db

    depends_on:
      - test_db
      - test_redis
    env_file: production.env

  test_redis:
    container_name: test_redis
    image: redis:5.0.12
    command: ["redis-server", "--requirepass", "${REDIS_PASSWORD}"]
    restart: unless-stopped

volumes:
  beat_scheduler:
    name: test_beat
  postgres_db:
    name: test_db

networks:
  default:
    name: test

I use this modules: django-exporter==2.2.2 prometheus_client==0.5.0

May 17 '21 05:05 heydardsm

Thank you for the report, 0.5.0 is more than two years old, do you still see this issue using more recent versions of this library?

Otherwise, is there anything to note in your environment, such as how is your multiproc file set/is it persisted across runs?

May 18 '21 21:05 csmarchbanks

I installed last version of django-exporter. I've update prometheus_client but it throw dependency error.

I didn't understand your second question.Can you explain more?

May 19 '21 07:05 heydardsm

Hmm, what is the dependency error? I wonder if django-exporter needs to upgrade a dependency.

My second question was trying to diagnose this issue a bit further. For example, in your docker-compose example I don't see what you are setting the prometheus_multiproc_dir environment variable to, could it be that the database files are persisted across restarts? If there is any other information you could provide that would be great. For example, if you have any unusually long metric labels or very high cardinality series.

There have been lots of bugs fixed with regards to multi-process since v0.5.0 so your best bet would be to manage to upgrade prometheus_client.

May 20 '21 20:05 csmarchbanks

I just encountered this, and I believe I figured out the cause. The problem is lack of uniqueness of PIDs between containers. The various mmap'ed files are being shared due to the common low-value PIDs. Using pid: "host"in the docker-compose.yml will use the host's PID namespace, ensuring uniqueness across containers. There may be other drawbacks.

Jul 20 '22 19:07 mkingsbury

I am going to close this as multiprocess does require a separate folder for each service.

Jan 25 '24 20:01 csmarchbanks