[BUG] Unexpected KeyDB Container Restart with Reason "Completed" in Kubernetes StatefulSet
Describe the bug
When running image docker.io/eqalpha/keydb:x86_64_v6.3.2in Kubernetes StatefulSet, the container unexpectedly terminates with the status "Completed", despite no manual shutdown being initiated.
lastState: terminated: containerID: containerd://8cb24568ab14bd0579427dd04505ad2fea2a0ea685ef63e4c64579b8f5f92888 exitCode: 0 finishedAt: "2025-02-26T08:39:54Z" reason: Completed startedAt: "2025-01-18T07:30:35Z" restartCount: 2
KeyDB was running for over 39 days before this unexpected termination
To reproduce
- Deploy KeyDB v6.3.2 in a Kubernetes StatefulSet
- Ensure it runs continuously for multiple weeks with default settings
- Observe pod termination with exitCode: 0 and reason: Completed without user intervention
- Check logs (kubectl logs keydb --previous) for any shutdown and SIGTERM messages
Expected behavior
KeyDB should not terminate itself unless explicitly shut down. The pod also should only restart if Kubernetes triggers it (SIGTERM)
Logs before the restart:
1:22:S 26 Feb 2025 06:30:06.901 * Background saving terminated with success 1:22:S 26 Feb 2025 06:35:07.089 * 10 changes in 300 seconds. Saving... 1:22:S 26 Feb 2025 06:35:07.090 * Background saving started 1:2875:S 26 Feb 2025 06:35:07.283 * DB saved on disk 1:2875:S 26 Feb 2025 06:35:07.287 * RDB: 40 MB of memory used by copy-on-write 1:22:S 26 Feb 2025 06:35:07.291 * Background saving terminated with success 1:22:S 26 Feb 2025 07:00:06.408 * 1 changes in 900 seconds. Saving... 1:22:S 26 Feb 2025 07:00:06.408 * Background saving started 1:8730:S 26 Feb 2025 07:00:06.652 * DB saved on disk 1:8730:S 26 Feb 2025 07:00:06.656 * RDB: 40 MB of memory used by copy-on-write 1:22:S 26 Feb 2025 07:00:06.711 * Background saving terminated with success 1:22:S 26 Feb 2025 07:05:07.069 * 10 changes in 300 seconds. Saving... 1:22:S 26 Feb 2025 07:05:07.069 * Background saving started 1:9907:S 26 Feb 2025 07:05:07.298 * DB saved on disk 1:9907:S 26 Feb 2025 07:05:07.302 * RDB: 40 MB of memory used by copy-on-write 1:22:S 26 Feb 2025 07:05:07.370 * Background saving terminated with success 1:22:S 26 Feb 2025 07:20:08.090 * 1 changes in 900 seconds. Saving... 1:22:S 26 Feb 2025 07:20:08.090 * Background saving started 1:13430:S 26 Feb 2025 07:20:08.538 * DB saved on disk 1:13430:S 26 Feb 2025 07:20:08.540 * RDB: 40 MB of memory used by copy-on-write 1:22:S 26 Feb 2025 07:20:08.606 * Background saving terminated with success 1:22:S 26 Feb 2025 07:30:10.975 * 10 changes in 300 seconds. Saving... 1:22:S 26 Feb 2025 07:30:10.975 * Background saving started 1:15780:S 26 Feb 2025 07:30:11.206 * DB saved on disk 1:15780:S 26 Feb 2025 07:30:11.210 * RDB: 40 MB of memory used by copy-on-write 1:22:S 26 Feb 2025 07:30:11.278 * Background saving terminated with success 1:22:S 26 Feb 2025 07:45:12.037 * 1 changes in 900 seconds. Saving... 1:22:S 26 Feb 2025 07:45:12.037 * Background saving started 1:19287:S 26 Feb 2025 07:45:12.413 * DB saved on disk 1:19287:S 26 Feb 2025 07:45:12.416 * RDB: 40 MB of memory used by copy-on-write 1:22:S 26 Feb 2025 07:45:12.449 * Background saving terminated with success 1:22:S 26 Feb 2025 08:00:12.781 * 10 changes in 300 seconds. Saving... 1:22:S 26 Feb 2025 08:00:12.781 * Background saving started 1:22790:S 26 Feb 2025 08:00:12.999 * DB saved on disk 1:22790:S 26 Feb 2025 08:00:13.001 * RDB: 40 MB of memory used by copy-on-write 1:22:S 26 Feb 2025 08:00:13.084 * Background saving terminated with success 1:22:S 26 Feb 2025 08:15:14.083 * 1 changes in 900 seconds. Saving... 1:22:S 26 Feb 2025 08:15:14.084 * Background saving started 1:26341:S 26 Feb 2025 08:15:14.304 * DB saved on disk 1:26341:S 26 Feb 2025 08:15:14.307 * RDB: 40 MB of memory used by copy-on-write 1:22:S 26 Feb 2025 08:15:14.386 * Background saving terminated with success 1:22:S 26 Feb 2025 08:30:11.271 * 10 changes in 300 seconds. Saving... 1:22:S 26 Feb 2025 08:30:11.271 * Background saving started 1:29848:S 26 Feb 2025 08:30:11.488 * DB saved on disk 1:29848:S 26 Feb 2025 08:30:11.490 * RDB: 40 MB of memory used by copy-on-write 1:22:S 26 Feb 2025 08:30:11.574 * Background saving terminated with success 1:22:S 26 Feb 2025 08:39:24.002 * 10 changes in 300 seconds. Saving... 1:22:S 26 Feb 2025 08:39:24.003 * Background saving started 1:32023:S 26 Feb 2025 08:39:24.253 * DB saved on disk 1:32023:S 26 Feb 2025 08:39:24.257 * RDB: 40 MB of memory used by copy-on-write 1:22:S 26 Feb 2025 08:39:24.307 * Background saving terminated with success 1:signal-handler (1740559193) Received SIGTERM scheduling shutdown... 1:22:S 26 Feb 2025 08:39:53.711 # User requested shutdown... 1:22:S 26 Feb 2025 08:39:53.711 * Saving the final RDB snapshot before exiting. 1:22:S 26 Feb 2025 08:39:54.027 * DB saved on disk 1:22:S 26 Feb 2025 08:39:54.027 * Removing the pid file. 1:22:S 26 Feb 2025 08:39:54.029 # KeyDB is now ready to exit, bye bye...
Additional information
KeyDB Version: v6.3.2 Deployment type: Kubernetes StatefulSet Liveness Probe:
livenessProbe: exec: command: - sh - -c - /health/ping_liveness_local.sh 5 failureThreshold: 5 initialDelaySeconds: 20 periodSeconds: 5 successThreshold: 1 timeoutSeconds: 6
Readiness Probe:
readinessProbe: exec: command: - sh - -c - /health/ping_readiness_local.sh 1 failureThreshold: 5 initialDelaySeconds: 20 periodSeconds: 5 successThreshold: 1 timeoutSeconds: 2
Persistence Enabled: Yes (PVC Attached)
Hey, I'm experiencing the exact same issue, but in my case, it happens after 52 days instead of 39.
Same behavior:
The container suddenly terminates with status "Completed", even though no manual shutdown was triggered. Exit code 0, reason Completed. Logs show normal background saving, then suddenly a SIGTERM and "User requested shutdown..." message. Running KeyDB v6.3.1 in a Kubernetes StatefulSet with persistence enabled. Also using the ReJSON module v2.0.11. Compiled KeyDB from source on Ubuntu 20. So it seems this issue isn't just in v6.3.2, since I'm seeing it in v6.3.1 as well. Could this be related to ReJSON or the way it was compiled? Has anyone found a workaround?
Have you checked syslog to see if anything that is sending the SIGTERM, maybe OOM killer?
Also check your probes maybe they failed at that time
I'm also experiencing the same issue here. But in my case, readiness probe fails every 5 min until there's no workload and the keydb container gets restarted randomly without any clue (just exit code 0 as same as you guys). Got the below log from a previous pod
1:signal-handler (1744805291) Received SIGTERM scheduling shutdown...
NOTICE: Detuning locks due to high load per core: 127.68%
NOTICE: Detuning locks due to high load per core: 163.28%
NOTICE: Detuning locks due to high load per core: 165.28%
NOTICE: Detuning locks due to high load per core: 151.22%
NOTICE: Detuning locks due to high load per core: 120.62%
NOTICE: Detuning locks due to high load per core: 192.66%
NOTICE: Detuning locks due to high load per core: 159.19%
1:25:M 16 Apr 2025 12:08:14.122 # User requested shutdown...
1:25:M 16 Apr 2025 12:08:14.122 * Removing the pid file.
1:25:M 16 Apr 2025 12:08:14.197 # KeyDB is now ready to exit, bye bye...
Seems like there's no workaround till now.