RuntimeError: Read beyond file size detected, /home/git/counter_1000.db is corrupted.
I have django-rest project that runs with gunicorn. After working a while with my apis it throw this error:
Traceback (most recent call last):
File "/home/git/app/test/./manage.py", line 21, in <module>
main()
File "/home/git/app/test/./manage.py", line 17, in main
execute_from_command_line(sys.argv)
File "/usr/local/lib/python3.9/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line
utility.execute()
File "/usr/local/lib/python3.9/site-packages/django/core/management/__init__.py", line 357, in execute
django.setup()
File "/usr/local/lib/python3.9/site-packages/django/__init__.py", line 24, in setup
apps.populate(settings.INSTALLED_APPS)
File "/usr/local/lib/python3.9/site-packages/django/apps/registry.py", line 91, in populate
app_config = AppConfig.create(entry)
File "/usr/local/lib/python3.9/site-packages/django/apps/config.py", line 90, in create
module = import_module(entry)
File "/usr/local/lib/python3.9/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 790, in exec_module
File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
File "/usr/local/lib/python3.9/site-packages/django_prometheus/__init__.py", line 14, in <module>
import django_prometheus.middleware
File "/usr/local/lib/python3.9/site-packages/django_prometheus/middleware.py", line 71, in <module>
http_streaming_responses = Counter(
File "/usr/local/lib/python3.9/site-packages/prometheus_client/metrics.py", line 98, in __init__
self._metric_init()
File "/usr/local/lib/python3.9/site-packages/prometheus_client/metrics.py", line 232, in _metric_init
self._value = values.ValueClass(self._type, self._name, self._name + '_total', self._labelnames, self._labelvalues)
File "/usr/local/lib/python3.9/site-packages/prometheus_client/values.py", line 49, in __init__
self.__reset()
File "/usr/local/lib/python3.9/site-packages/prometheus_client/values.py", line 63, in __reset
files[file_prefix] = MmapedDict(filename)
File "/usr/local/lib/python3.9/site-packages/prometheus_client/mmap_dict.py", line 53, in __init__
for key, _, pos in self._read_all_values():
File "/usr/local/lib/python3.9/site-packages/prometheus_client/mmap_dict.py", line 89, in _read_all_values
raise RuntimeError(msg % self._fname)
RuntimeError: Read beyond file size detected, /home/git/counter_1000.db is corrupted.
This happened after a while of when run a celery task from ./manage.py shell. I handle my project with docker and this is my docker-compose.yml:
version: '3.7'
services:
test_db:
container_name: test_db
image: postgres:13.1
restart: unless-stopped
environment:
POSTGRES_DB: "test"
volumes:
- type: volume
source: postgres_db
target: /var/lib/postgresql/data
env_file: production.env
test_web:
container_name: test_web
build:
context: .
target: django
restart: unless-stopped
depends_on:
- test_db
volumes:
- .:/home/git/app
ports:
- "127.0.0.1:9000:8080"
command: ["gunicorn", "--reload", "--timeout", "10", "--max-requests", "2000", "--worker-tmp-dir", "/dev/shm", "--workers=8", "--bind=0.0.0.0:8080", "--chdir", "./test", "test.wsgi"]
cap_drop:
- ALL
env_file: production.env
test_cel_lp:
container_name: test_cel_lp
build:
context: .
target: django
working_dir: /home/git/app/test
command: ["celery", "-A", "test", "worker", "-l", "info", "-Q", "low_priority", "--autoscale", "2,1"]
restart: unless-stopped
volumes:
- .:/home/git/app
depends_on:
- test_db
- test_redis
env_file: production.env
test_cel_d:
container_name: test_cel_d
build:
context: .
target: django
working_dir: /home/git/app/test
command: ["celery", "-A", "test", "worker", "-l", "info", "-Q", "default", "--autoscale", "4,2"]
restart: unless-stopped
volumes:
- .:/home/git/app
depends_on:
- test_db
- test_redis
env_file: production.env
test_cel_hp:
container_name: test_cel_hp
build:
context: .
target: django
working_dir: /home/git/app/test
command: ["celery", "-A", "test", "worker", "-l", "info", "-Q", "high_priority", "--autoscale", "8,4"]
restart: unless-stopped
volumes:
- .:/home/git/app
depends_on:
- test_db
- test_redis
env_file: production.env
test_cel_beat:
container_name: test_cel_beat
build:
context: .
target: django
working_dir: /home/git/app/test
command: ["celery", "-A", "test", "beat", "-l", "info", "-s", "/home/git/db/scheduler.db", "--pidfile=/tmp/beat.pid"]
restart: unless-stopped
volumes:
- .:/home/git/app
- type: volume
source: beat_scheduler
target: /home/git/db
depends_on:
- test_db
- test_redis
env_file: production.env
test_redis:
container_name: test_redis
image: redis:5.0.12
command: ["redis-server", "--requirepass", "${REDIS_PASSWORD}"]
restart: unless-stopped
volumes:
beat_scheduler:
name: test_beat
postgres_db:
name: test_db
networks:
default:
name: test
I use this modules: django-exporter==2.2.2 prometheus_client==0.5.0
Thank you for the report, 0.5.0 is more than two years old, do you still see this issue using more recent versions of this library?
Otherwise, is there anything to note in your environment, such as how is your multiproc file set/is it persisted across runs?
I installed last version of django-exporter. I've update prometheus_client but it throw dependency error.
I didn't understand your second question.Can you explain more?
Hmm, what is the dependency error? I wonder if django-exporter needs to upgrade a dependency.
My second question was trying to diagnose this issue a bit further. For example, in your docker-compose example I don't see what you are setting the prometheus_multiproc_dir environment variable to, could it be that the database files are persisted across restarts? If there is any other information you could provide that would be great. For example, if you have any unusually long metric labels or very high cardinality series.
There have been lots of bugs fixed with regards to multi-process since v0.5.0 so your best bet would be to manage to upgrade prometheus_client.
I just encountered this, and I believe I figured out the cause. The problem is lack of uniqueness of PIDs between containers. The various mmap'ed files are being shared due to the common low-value PIDs. Using pid: "host"in the docker-compose.yml will use the host's PID namespace, ensuring uniqueness across containers. There may be other drawbacks.
I am going to close this as multiprocess does require a separate folder for each service.