源为Redis5.0集群,目标为Redis6.0集群,使用cluster_helper.py进行同步报错
-
环境信息 源:Redis5.0 集群,三主三从 目标:Redis6.0 集群,三主三从 Python版本:3.6.15
-
同步是报错

-
console 完整日志如下: [root@redis-cluster cluster_helper]# /usr/local/python3.6/bin/python3 cluster_helper.py ../redis-shake ../sync.toml redis-shake path: ../redis-shake redis-shake abs path: /root/redis5.0.14-cluster/redis-shake/redis-shake {'type': 'sync', 'source': {'version': 5.0, 'address': '192.168.31.116:7000', 'username': '', 'password': '', 'tls': False, 'elasticache_psync': ''}, 'target': {'type': 'cluster', 'version': 6.0, 'address': '192.168.31.116:6080', 'username': '', 'password': '', 'tls': False}, 'advanced': {'dir': 'data', 'ncpu': 4, 'pprof_port': 0, 'metrics_port': 0, 'log_file': 'redis-shake.log', 'log_level': 'debug', 'log_interval': 5, 'rdb_restore_command_behavior': 'rewrite', 'pipeline_count_limit': 1024, 'target_redis_client_max_querybuf_len': 1024000000, 'target_redis_proto_max_bulk_len': 512000000}} host: 192.168.31.116, port: 7000, username: , password: , tls: False cluster nodes: {'192.168.31.116:7001': {'node_id': 'aef020dbf74080b13de63cf8538d9206b632bf3d', 'flags': 'myself,master', 'master_id': '-', 'last_ping_sent': '0', 'last_pong_rcvd': '1667326623000', 'epoch': '2', 'slots': [['5461', '10922']], 'migrations': [], 'connected': True}, '192.168.31.116:7003': {'node_id': 'c3c04e5d875dc2eb1efd78b0c56a05d69e58b05d', 'flags': 'slave', 'master_id': 'fd5ba061601a17d9e848e321508c5633b9b1c219', 'last_ping_sent': '0', 'last_pong_rcvd': '1667326623403', 'epoch': '4', 'slots': [], 'migrations': [], 'connected': True}, '192.168.31.116:7004': {'node_id': 'e05ac64a4cb23fa6cb4a118352e9863916231c62', 'flags': 'slave', 'master_id': '0b2055c261e33a4478b7d3462ca6a767cfabe249', 'last_ping_sent': '0', 'last_pong_rcvd': '1667326623404', 'epoch': '5', 'slots': [], 'migrations': [], 'connected': True}, '192.168.31.116:7005': {'node_id': '5806d2d9f0bdc8bd6edd47f4254c122b61c57b0c', 'flags': 'slave', 'master_id': 'aef020dbf74080b13de63cf8538d9206b632bf3d', 'last_ping_sent': '0', 'last_pong_rcvd': '1667326622000', 'epoch': '6', 'slots': [], 'migrations': [], 'connected': True}, '192.168.31.116:7002': {'node_id': 'fd5ba061601a17d9e848e321508c5633b9b1c219', 'flags': 'master', 'master_id': '-', 'last_ping_sent': '0', 'last_pong_rcvd': '1667326622000', 'epoch': '3', 'slots': [['10923', '16383']], 'migrations': [], 'connected': True}, '192.168.31.116:7000': {'node_id': '0b2055c261e33a4478b7d3462ca6a767cfabe249', 'flags': 'master', 'master_id': '-', 'last_ping_sent': '0', 'last_pong_rcvd': '1667326622390', 'epoch': '1', 'slots': [['0', '5460']], 'migrations': [], 'connected': True}} addresses: 192.168.31.116:7001 192.168.31.116:7002 192.168.31.116:7000 start syncing... sleep 3 seconds to wait redis-shake start ================ 2022-11-07 18:48:04 ================ hello 11008 get metrics from [192.168.31.116:7001] failed: HTTPConnectionPool(host='localhost', port=11008): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fc2ebf09128>: Failed to establish a new connection: [Errno 111] Connection refused',)) hello 11009 get metrics from [192.168.31.116:7002] failed: HTTPConnectionPool(host='localhost', port=11009): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fc2ebf09828>: Failed to establish a new connection: [Errno 111] Connection refused',)) hello 11010 get metrics from [192.168.31.116:7000] failed: HTTPConnectionPool(host='localhost', port=11010): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fc2ebf09f60>: Failed to establish a new connection: [Errno 111] Connection refused',)) no redis-shake is running Waiting for process 56811 to exit... process 56811 exited. Waiting for process 56812 to exit... process 56812 exited. Waiting for process 56818 to exit... process 56818 exited.
-
stderr [root@redis-cluster 192_168_31_116_7000]# cat stderr panic: dial tcp 127.0.0.1:6080: connect: connection refused
goroutine 1 [running]: github.com/rs/zerolog.(*Logger).Panic.func1({0xc00009a0c0, 0x0}) github.com/rs/[email protected]/log.go:359 +0x2d github.com/rs/zerolog.(*Event).msg(0xc0000900c0, {0xc00009a0c0, 0x34}) github.com/rs/[email protected]/event.go:156 +0x2b8 github.com/rs/zerolog.(*Event).Msg(...) github.com/rs/[email protected]/event.go:108 github.com/alibaba/RedisShake/internal/log.logFinally(0xc0000900c0, {0xc00009a080, 0xc00008e0e0}, {0x0, 0x17, 0xc0000a2048}) github.com/alibaba/RedisShake/internal/log/func.go:77 +0x53 github.com/alibaba/RedisShake/internal/log.Panicf({0xc00009a080, 0x34}, {0x0, 0x0, 0x0}) github.com/alibaba/RedisShake/internal/log/func.go:27 +0x57 github.com/alibaba/RedisShake/internal/log.PanicError({0x81bc20, 0xc0000ba050}) github.com/alibaba/RedisShake/internal/log/func.go:31 +0x33 github.com/alibaba/RedisShake/internal/client.NewRedisClient({0xc0000b2189, 0xe}, {0x0, 0x0}, {0x0, 0x0}, 0x0) github.com/alibaba/RedisShake/internal/client/redis.go:32 +0x165 github.com/alibaba/RedisShake/internal/writer.NewRedisWriter({0xc0000b2189, 0xe}, {0x0, 0x0}, {0x0, 0x0}, 0x0) github.com/alibaba/RedisShake/internal/writer/redis.go:30 +0x87 github.com/alibaba/RedisShake/internal/writer.(*RedisClusterWriter).loadClusterNodes(0xc000200000, {0xc0000181b0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, 0xa8) github.com/alibaba/RedisShake/internal/writer/redis_cluster.go:58 +0x45e github.com/alibaba/RedisShake/internal/writer.NewRedisClusterWriter({0xc0000181b0, 0x13}, {0x0, 0x0}, {0x0, 0x0}, 0x10) github.com/alibaba/RedisShake/internal/writer/redis_cluster.go:23 +0x8d main.main() github.com/alibaba/RedisShake/cmd/redis-shake/main.go:75 +0x3c9
-
stdout [root@redis-cluster 192_168_31_116_7000]# cat stdout 2022-11-07 18:48:01 INF GOOS: linux, GOARCH: amd64 2022-11-07 18:48:01 INF Ncpu: 4, GOMAXPROCS: 4 2022-11-07 18:48:01 INF pid: 56818 2022-11-07 18:48:01 INF pprof_port: 0 2022-11-07 18:48:01 INF No lua file specified, will not filter any cmd. 2022-11-07 18:48:01 INF metrics url: http://localhost:11010 2022-11-07 18:48:01 INF no password. address=[192.168.31.116:6080] 2022-11-07 18:48:01 INF redisClusterWriter load cluster nodes. line=2881e7d06e1b5e0e73d7dfd195d6764b0aba708f 127.0.0.1:6080@16080 myself,master - 0 1667326621000 1 connected 10926-16383 2022-11-07 18:48:01 PNC dial tcp 127.0.0.1:6080: connect: connection refused [root@redis-cluster 192_168_31_116_7000]#
-
sync.toml [root@redis-cluster 192_168_31_116_7000]# cat sync.toml type = "sync"
[source] version = 5.0 address = "192.168.31.116:7000" username = "" password = "" tls = false elasticache_psync = ""
[target] type = "cluster" version = 6.0 address = "192.168.31.116:6080" username = "" password = "" tls = false
[advanced] dir = "data" ncpu = 4 pprof_port = 0 metrics_port = 11010 log_file = "redis-shake.log" log_level = "debug" log_interval = 5 rdb_restore_command_behavior = "rewrite" pipeline_count_limit = 1024 target_redis_client_max_querybuf_len = 1024000000 target_redis_proto_max_bulk_len = 512000000
- redis-shake.log {"level":"info","time":"2022-11-07T18:48:01+08:00","message":"GOOS: linux, GOARCH: amd64"} {"level":"info","time":"2022-11-07T18:48:01+08:00","message":"Ncpu: 4, GOMAXPROCS: 4"} {"level":"info","time":"2022-11-07T18:48:01+08:00","message":"pid: 56818"} {"level":"info","time":"2022-11-07T18:48:01+08:00","message":"pprof_port: 0"} {"level":"info","time":"2022-11-07T18:48:01+08:00","message":"No lua file specified, will not filter any cmd."} {"level":"info","time":"2022-11-07T18:48:01+08:00","message":"metrics url: http://localhost:11010"} {"level":"info","time":"2022-11-07T18:48:01+08:00","message":"no password. address=[192.168.31.116:6080]"} {"level":"info","time":"2022-11-07T18:48:01+08:00","message":"redisClusterWriter load cluster nodes. line=2881e7d06e1b5e0e73d7dfd195d6764b0aba708f 127.0.0.1:6080@16080 myself,master - 0 1667326621000 1 connected 10926-16383"} {"level":"panic","time":"2022-11-07T18:48:01+08:00","message":"dial tcp 127.0.0.1:6080: connect: connection refused"}
192.168.31.116:6080 这个集群的 cluster nodes 表有问题,自建集群吗?
Source:Redis5.0.3 集群,三主三从 Target:Redis6.0.6 集群,三主三从 Python版本:3.6
我有遇到相同問題,兩邊都自建集群,主要是 target 6.0 那邊的 tcp-keepalive 預設為 60 斷線, shake 會中斷導致
修改 redis.conf tcp-keepalive 28800
目前我是這樣解決的
@EvanLyu 谢谢。如果 shake 等待源端的 rdb 文件超过了 60s 确实可能有这个问题。我记个 TODO 后面改下。