Backup failed with cloud storage report `reset by peers`
Please answer these questions before submitting your issue. Thanks!
- What did you do? If possible, provide a recipe for reproducing the error.
When the gcs network is unstable, it will give tikv a reset by peer error. This error does not trigger tikv's retry mechanism, and backup exits with an error.
TiKV log
[2020/11/04 06:29:58.792 +00:00] [ERROR] [endpoint.rs:292] ["backup save file failed"] [err_code=KV:Unknown] [err="Io(Custom { kind: InvalidInput, error: \"invalid HTTP request: connection error: Connection reset by peer (os error 104)\" })"]
[2020/11/04 06:29:58.792 +00:00] [ERROR] [endpoint.rs:694] ["backup region failed"] [err_code=KV:Unknown] [err="Io(Custom { kind: InvalidInput, error: \"invalid HTTP request: connection error: Connection reset by peer (os error 104)\" })"] [end_key=7480000000000000E75F72800000000EE03CD9] [start_key=7480000000000000E75F72800000000EB0EE2D] [region="id: 110409 start_key: 7480000000000000FFE75F72800000000EFFB0EE2D0000000000FA end_key: 7480000000000000FFE75F72800000000EFFE03CD90000000000FA region_epoch { conf_ver: 5 version: 10797 } peers { id: 110410 store_id: 1 } peers { id: 110411 store_id: 4 } peers { id: 110412 store_id: 5 }"]
[2020/11/04 06:29:58.825 +00:00] [ERROR] [client.rs:438] ["failed to send heartbeat"] [err_code=KV:PD:gRPC] [err="Grpc(RpcFinished(Some(RpcStatus { status: 14-UNAVAILABLE, details: Some(\"Connection reset by peer\") })))"]
[2020/11/04 06:29:58.834 +00:00] [ERROR] [util.rs:301] ["request failed, retry"] [err_code=KV:Unknown] [err="Grpc(RpcFinished(Some(RpcStatus { status: 14-UNAVAILABLE, details: Some(\"Connection reset by peer\") })))"]
[2020/11/04 06:29:58.856 +00:00] [ERROR] [util.rs:301] ["request failed, retry"] [err_code=KV:Unknown] [err="Other(SendError(\"...\"))"]
[2020/11/04 06:29:58.856 +00:00] [ERROR] [util.rs:301] ["request failed, retry"] [err_code=KV:Unknown] [err="Other(SendError(\"...\"))"]
[2020/11/04 06:29:58.857 +00:00] [INFO] [util.rs:419] ["connecting to PD endpoint"] [endpoints=https://db-pd-0.db-pd-peer.tidb1323802126710738944.svc:2379]
[2020/11/04 06:29:58.879 +00:00] [INFO] [<unknown>] ["New connected subchannel at 0x7fb975e55ed0 for subchannel 0x7fb975ea1700"]
[2020/11/04 06:29:58.895 +00:00] [INFO] [util.rs:419] ["connecting to PD endpoint"] [endpoints=https://db-pd-1.db-pd-peer.tidb1323802126710738944.svc:2379]
[2020/11/04 06:29:58.905 +00:00] [INFO] [<unknown>] ["New connected subchannel at 0x7fb975e56b90 for subchannel 0x7fb975ea1c40"]
[2020/11/04 06:29:58.913 +00:00] [INFO] [util.rs:484] ["connected to PD leader"] [endpoints=https://db-pd-1.db-pd-peer.tidb1323802126710738944.svc:2379]
[2020/11/04 06:29:58.913 +00:00] [INFO] [util.rs:190] ["heartbeat sender and receiver are stale, refreshing ..."]
[2020/11/04 06:29:58.936 +00:00] [WARN] [util.rs:209] ["updating PD client done"] [spend=79.041063ms]
Add some crash bag or duplicate bag in chaos env may reproduce this problem.
-
What did you expect to see? Backup successful
-
What did you see instead? Backup Failed
-
What version of BR and TiDB/TiKV/PD are you using?
br -V
Release Version: v4.0.8
Git Commit Hash: c2ed897feadaae1ae27a4111cd44b1840941e9be
Git Branch: heads/refs/tags/v4.0.8
Go Version: go1.13
UTC Build Time: 2020-10-30 08:14:21
Race Enabled: false
tidb-server -V
Release Version: v4.0.8
Edition: Community
Git Commit Hash: 66ac9fc31f1733e5eb8d11891ec1b38f9c422817
Git Branch: heads/refs/tags/v4.0.8
UTC Build Time: 2020-10-30 08:21:16
GoVersion: go1.13
Race Enabled: false
TiKV Min Version: v3.0.0-60965b006877ca7234adaced7890d7b029ed1306
Check Table Before Drop: false
tikv-server -V
/ # /tikv-server -V
TiKV
Release Version: 4.0.8
Edition: Community
Git Commit Hash: 83091173e960e5a0f5f417e921a0801d2f6635ae
Git Commit Branch: heads/refs/tags/v4.0.8
UTC Build Time: 2020-10-30 08:40:33
Rust Version: rustc 1.42.0-nightly (0de96d37f 2019-12-19)
Enable Features: jemalloc mem-profiling portable sse protobuf-codec
Profile: dist_release
pd-server -V
Release Version: v4.0.8
Edition: Community
Git Commit Hash: 775b6a5ef517f8ab2f43fef6418bbfc7d6c9c9dc
Git Branch: heads/refs/tags/v4.0.8
UTC Build Time: 2020-10-30 08:15:09
Some related issues: https://github.com/googleapis/google-cloud-go/issues/108 https://stackoverflow.com/questions/51624788/google-cloud-storage-batch-move-file-failure-connection-reset-by-peer