Performance issue with keydb
Hello. I have launched dockerized instance of keydb with image eqalpha/keydb:x86_64_v6.0.16 and configured active-replica yes
Then I copied aof file and started instance with this docker-compose:
version: '3.4'
services:
keydb:
image: eqalpha/keydb:x86_64_v6.0.16
container_name: keydb
restart: unless-stopped
security_opt:
- seccomp:unconfined
network_mode: host
volumes:
- /db/keydb/:/data/
- type: bind
source: ./keydb.conf
target: /etc/keydb/keydb.conf
logging:
driver: "json-file"
options:
max-file: "5"
max-size: 10m
keydb.conf is:
bind 0.0.0.0
protected-mode no
port 6379
tcp-backlog 511
timeout 0
tcp-keepalive 300
supervised no
pidfile /var/run/keydb_6379.pid
loglevel notice
databases 16
always-show-logo yes
save ""
save ""
save ""
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename dump.rdb
dir /data
replica-serve-stale-data yes
replica-read-only yes
repl-diskless-sync no
repl-diskless-sync-delay 5
repl-disable-tcp-nodelay no
replica-priority 100
lazyfree-lazy-eviction no
lazyfree-lazy-expire no
lazyfree-lazy-server-del no
replica-lazy-flush no
appendonly yes
appendfilename "appendonly.aof"
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 25
auto-aof-rewrite-min-size 7gb
aof-load-truncated yes
aof-use-rdb-preamble yes
lua-time-limit 5000
slowlog-log-slower-than 10000
slowlog-max-len 128
latency-monitor-threshold 0
notify-keyspace-events ""
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-size -2
list-compress-depth 0
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
hll-sparse-max-bytes 3000
stream-node-max-bytes 4096
stream-node-max-entries 100
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit replica 1024mb 1024mb 0
client-output-buffer-limit pubsub 128mb 64mb 60
hz 10
dynamic-hz yes
aof-rewrite-incremental-fsync yes
rdb-save-incremental-fsync yes
server-threads 4
rename-command FLUSHDB ""
rename-command FLUSHALL ""
requirepass "**"
masterauth "**"
active-replica yes
replicaof host-b 6379
replica-announce-ip host-a-ip
replica-announce-port 6379
It start without any issue and now there is about 2.5 millions of keys. Replication is also fine. All ~400 clients are connected to host-a, host-b is for manual standby. But the mean time of get operations is very poor:
keydb-benchmark -h `hostname -f` -a ** -t get -n 1000
...
95.30% <= 2141 milliseconds
...
100.00% <= 3040 milliseconds
50.08 requests per second
I decided to enable watchdog config set watchdog-period 500 and got these records in log:
1:signal-handler (1621551890)
--- WATCHDOG TIMER EXPIRED ---
EIP:
/lib/x86_64-linux-gnu/libc.so.6(syscall+0x19) [0x7f8f642e2959]
Backtrace:
keydb-server 0.0.0.0:6379(logStackTrace(ucontext_t*)+0x6b) [0x556bd5ca592b]
keydb-server 0.0.0.0:6379(watchdogSignalHandler(int, siginfo_t*, void*)+0x1d) [0x556bd5ca59cd]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x128a0) [0x7f8f645ca8a0]
/lib/x86_64-linux-gnu/libc.so.6(syscall+0x19) [0x7f8f642e2959]
keydb-server 0.0.0.0:6379(fastlock_sleep+0xa4) [0x556bd5d00624]
keydb-server 0.0.0.0:6379(+0x110399) [0x556bd5d06399]
keydb-server 0.0.0.0:6379(aeProcessEvents+0x2a7) [0x556bd5c43e97]
keydb-server 0.0.0.0:6379(aeMain+0x45) [0x556bd5c442a5]
keydb-server 0.0.0.0:6379(workerThreadMain(void*)+0x74) [0x556bd5c4ac34]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f8f645bf6db]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f8f642e8a3f]
1:signal-handler (1621551890) --------
1:signal-handler (1621551893)
--- WATCHDOG TIMER EXPIRED ---
EIP:
/lib/x86_64-linux-gnu/libc.so.6(syscall+0x19) [0x7f8f642e2959]
Backtrace:
keydb-server 0.0.0.0:6379(logStackTrace(ucontext_t*)+0x6b) [0x556bd5ca592b]
keydb-server 0.0.0.0:6379(watchdogSignalHandler(int, siginfo_t*, void*)+0x1d) [0x556bd5ca59cd]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x128a0) [0x7f8f645ca8a0]
/lib/x86_64-linux-gnu/libc.so.6(syscall+0x19) [0x7f8f642e2959]
keydb-server 0.0.0.0:6379(fastlock_sleep+0xa4) [0x556bd5d00624]
keydb-server 0.0.0.0:6379(+0x110399) [0x556bd5d06399]
keydb-server 0.0.0.0:6379(aeProcessEvents+0x2a7) [0x556bd5c43e97]
keydb-server 0.0.0.0:6379(aeMain+0x45) [0x556bd5c442a5]
keydb-server 0.0.0.0:6379(workerThreadMain(void*)+0x74) [0x556bd5c4ac34]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f8f645bf6db]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f8f642e8a3f]
1:signal-handler (1621551893) --------
Both servers are baremetal, have 20cpus and 128G ram, network 10Gbps. OS CentOS Linux release 7.9.2009 (Core).
I set somaxconn with sysctl -w net.core.somaxconn=1024 and disabled transparent_hugepage with "echo never | tee /sys/kernel/mm/transparent_hugepage/enabled".
top shows keydb is consuming 90-140% of CPU
perf top first lines:
Samples: 57K of event 'cycles:ppp', 4000 Hz, Event count (approx.): 26274076884 lost: 0/0 drop: 0/0
Overhead Shared Object Symbol
37.75% keydb-server [.] 0x0000000000077297
15.24% keydb-server [.] 0x0000000000050cf9
7.92% keydb-server [.] 0x0000000000110388
1.36% keydb-server [.] 0x0000000000077294
1.14% keydb-server [.] 0x000000000011038d
I have no idea how to reproduce this, can you help me to find out what I am doing wrong?
I am having the same issue recently, the CPU load for one of the cluster is insanely high, and it is almost impossible to run any thing on it.
At the meantime, the dump.rdb file stops updating due to the limitation.
Hi Akosyrev & Lubard
Thank you for contacting EQAlpha. We appreciate you reaching out to us.
- For starters, you can upgrade
server-threads 7
See : https://docs.keydb.dev/blog/2019/10/28/blog-post
However since you have 20 CPUs[i assume you hence also have 20 cores], you can also increase it to as many as
server-threads 20
- Uncomment
server-thread-affinity true
to optimize CPU usage
- I see that you have some configuration parameter that does take up some work, you could try disabling them for some potential performance improvements.