KeyDB icon indicating copy to clipboard operation
KeyDB copied to clipboard

[BUG] Unexpected KeyDB Container Restart with Reason "Completed" in Kubernetes StatefulSet

Open persikida opened this issue 10 months ago • 4 comments

Describe the bug

When running image docker.io/eqalpha/keydb:x86_64_v6.3.2in Kubernetes StatefulSet, the container unexpectedly terminates with the status "Completed", despite no manual shutdown being initiated.

lastState: terminated: containerID: containerd://8cb24568ab14bd0579427dd04505ad2fea2a0ea685ef63e4c64579b8f5f92888 exitCode: 0 finishedAt: "2025-02-26T08:39:54Z" reason: Completed startedAt: "2025-01-18T07:30:35Z" restartCount: 2

KeyDB was running for over 39 days before this unexpected termination

To reproduce

  1. Deploy KeyDB v6.3.2 in a Kubernetes StatefulSet
  2. Ensure it runs continuously for multiple weeks with default settings
  3. Observe pod termination with exitCode: 0 and reason: Completed without user intervention
  4. Check logs (kubectl logs keydb --previous) for any shutdown and SIGTERM messages

Expected behavior

KeyDB should not terminate itself unless explicitly shut down. The pod also should only restart if Kubernetes triggers it (SIGTERM)

Logs before the restart:

1:22:S 26 Feb 2025 06:30:06.901 * Background saving terminated with success 1:22:S 26 Feb 2025 06:35:07.089 * 10 changes in 300 seconds. Saving... 1:22:S 26 Feb 2025 06:35:07.090 * Background saving started 1:2875:S 26 Feb 2025 06:35:07.283 * DB saved on disk 1:2875:S 26 Feb 2025 06:35:07.287 * RDB: 40 MB of memory used by copy-on-write 1:22:S 26 Feb 2025 06:35:07.291 * Background saving terminated with success 1:22:S 26 Feb 2025 07:00:06.408 * 1 changes in 900 seconds. Saving... 1:22:S 26 Feb 2025 07:00:06.408 * Background saving started 1:8730:S 26 Feb 2025 07:00:06.652 * DB saved on disk 1:8730:S 26 Feb 2025 07:00:06.656 * RDB: 40 MB of memory used by copy-on-write 1:22:S 26 Feb 2025 07:00:06.711 * Background saving terminated with success 1:22:S 26 Feb 2025 07:05:07.069 * 10 changes in 300 seconds. Saving... 1:22:S 26 Feb 2025 07:05:07.069 * Background saving started 1:9907:S 26 Feb 2025 07:05:07.298 * DB saved on disk 1:9907:S 26 Feb 2025 07:05:07.302 * RDB: 40 MB of memory used by copy-on-write 1:22:S 26 Feb 2025 07:05:07.370 * Background saving terminated with success 1:22:S 26 Feb 2025 07:20:08.090 * 1 changes in 900 seconds. Saving... 1:22:S 26 Feb 2025 07:20:08.090 * Background saving started 1:13430:S 26 Feb 2025 07:20:08.538 * DB saved on disk 1:13430:S 26 Feb 2025 07:20:08.540 * RDB: 40 MB of memory used by copy-on-write 1:22:S 26 Feb 2025 07:20:08.606 * Background saving terminated with success 1:22:S 26 Feb 2025 07:30:10.975 * 10 changes in 300 seconds. Saving... 1:22:S 26 Feb 2025 07:30:10.975 * Background saving started 1:15780:S 26 Feb 2025 07:30:11.206 * DB saved on disk 1:15780:S 26 Feb 2025 07:30:11.210 * RDB: 40 MB of memory used by copy-on-write 1:22:S 26 Feb 2025 07:30:11.278 * Background saving terminated with success 1:22:S 26 Feb 2025 07:45:12.037 * 1 changes in 900 seconds. Saving... 1:22:S 26 Feb 2025 07:45:12.037 * Background saving started 1:19287:S 26 Feb 2025 07:45:12.413 * DB saved on disk 1:19287:S 26 Feb 2025 07:45:12.416 * RDB: 40 MB of memory used by copy-on-write 1:22:S 26 Feb 2025 07:45:12.449 * Background saving terminated with success 1:22:S 26 Feb 2025 08:00:12.781 * 10 changes in 300 seconds. Saving... 1:22:S 26 Feb 2025 08:00:12.781 * Background saving started 1:22790:S 26 Feb 2025 08:00:12.999 * DB saved on disk 1:22790:S 26 Feb 2025 08:00:13.001 * RDB: 40 MB of memory used by copy-on-write 1:22:S 26 Feb 2025 08:00:13.084 * Background saving terminated with success 1:22:S 26 Feb 2025 08:15:14.083 * 1 changes in 900 seconds. Saving... 1:22:S 26 Feb 2025 08:15:14.084 * Background saving started 1:26341:S 26 Feb 2025 08:15:14.304 * DB saved on disk 1:26341:S 26 Feb 2025 08:15:14.307 * RDB: 40 MB of memory used by copy-on-write 1:22:S 26 Feb 2025 08:15:14.386 * Background saving terminated with success 1:22:S 26 Feb 2025 08:30:11.271 * 10 changes in 300 seconds. Saving... 1:22:S 26 Feb 2025 08:30:11.271 * Background saving started 1:29848:S 26 Feb 2025 08:30:11.488 * DB saved on disk 1:29848:S 26 Feb 2025 08:30:11.490 * RDB: 40 MB of memory used by copy-on-write 1:22:S 26 Feb 2025 08:30:11.574 * Background saving terminated with success 1:22:S 26 Feb 2025 08:39:24.002 * 10 changes in 300 seconds. Saving... 1:22:S 26 Feb 2025 08:39:24.003 * Background saving started 1:32023:S 26 Feb 2025 08:39:24.253 * DB saved on disk 1:32023:S 26 Feb 2025 08:39:24.257 * RDB: 40 MB of memory used by copy-on-write 1:22:S 26 Feb 2025 08:39:24.307 * Background saving terminated with success 1:signal-handler (1740559193) Received SIGTERM scheduling shutdown... 1:22:S 26 Feb 2025 08:39:53.711 # User requested shutdown... 1:22:S 26 Feb 2025 08:39:53.711 * Saving the final RDB snapshot before exiting. 1:22:S 26 Feb 2025 08:39:54.027 * DB saved on disk 1:22:S 26 Feb 2025 08:39:54.027 * Removing the pid file. 1:22:S 26 Feb 2025 08:39:54.029 # KeyDB is now ready to exit, bye bye...

Additional information

KeyDB Version: v6.3.2 Deployment type: Kubernetes StatefulSet Liveness Probe:

livenessProbe: exec: command: - sh - -c - /health/ping_liveness_local.sh 5 failureThreshold: 5 initialDelaySeconds: 20 periodSeconds: 5 successThreshold: 1 timeoutSeconds: 6

Readiness Probe:

readinessProbe: exec: command: - sh - -c - /health/ping_readiness_local.sh 1 failureThreshold: 5 initialDelaySeconds: 20 periodSeconds: 5 successThreshold: 1 timeoutSeconds: 2

Persistence Enabled: Yes (PVC Attached)

persikida avatar Mar 05 '25 14:03 persikida

Hey, I'm experiencing the exact same issue, but in my case, it happens after 52 days instead of 39.

Same behavior:

The container suddenly terminates with status "Completed", even though no manual shutdown was triggered. Exit code 0, reason Completed. Logs show normal background saving, then suddenly a SIGTERM and "User requested shutdown..." message. Running KeyDB v6.3.1 in a Kubernetes StatefulSet with persistence enabled. Also using the ReJSON module v2.0.11. Compiled KeyDB from source on Ubuntu 20. So it seems this issue isn't just in v6.3.2, since I'm seeing it in v6.3.1 as well. Could this be related to ReJSON or the way it was compiled? Has anyone found a workaround?

agosalvez avatar Mar 05 '25 15:03 agosalvez

Have you checked syslog to see if anything that is sending the SIGTERM, maybe OOM killer?

keithchew avatar Mar 07 '25 05:03 keithchew

Also check your probes maybe they failed at that time

artarik avatar Mar 19 '25 07:03 artarik

I'm also experiencing the same issue here. But in my case, readiness probe fails every 5 min until there's no workload and the keydb container gets restarted randomly without any clue (just exit code 0 as same as you guys). Got the below log from a previous pod

1:signal-handler (1744805291) Received SIGTERM scheduling shutdown...
NOTICE: Detuning locks due to high load per core: 127.68%
NOTICE: Detuning locks due to high load per core: 163.28%
NOTICE: Detuning locks due to high load per core: 165.28%
NOTICE: Detuning locks due to high load per core: 151.22%
NOTICE: Detuning locks due to high load per core: 120.62%
NOTICE: Detuning locks due to high load per core: 192.66%
NOTICE: Detuning locks due to high load per core: 159.19%
1:25:M 16 Apr 2025 12:08:14.122 # User requested shutdown...
1:25:M 16 Apr 2025 12:08:14.122 * Removing the pid file.
1:25:M 16 Apr 2025 12:08:14.197 # KeyDB is now ready to exit, bye bye...

Seems like there's no workaround till now.

batulziiy avatar Apr 16 '25 12:04 batulziiy