braft snapshot: Transfer files concurrently to speed up snapshot transfer

Currently snapshot transfer happens by sending GetFileRequest for every file known to be in the remote snapshot. This happens sequentially for each file.

The only real configurations which allow tuning the throughput of this transfer are the throttle which can be set when initializing the braft::Node, or the runtime configuration raft_max_byte_count_per_rpc which determines how many chunks a large file will be broken into during the transfer. The default is 128KiB, so a 1MiB file will be transfered in about 8 GetFileRequests.

This works great for snapshots which have a handful of large files. But if a snapshot has hundreds or thousands of small files then transfer of these snapshots can be pretty slow.

I locally create a snapshot with 100k files on my development machine, for example, it can take up to 30 minutes to transfer all of those files in that snapshot. Even though the latency per transfer is low, there is a full round trip plus a flush of __raft_meta on the receiving end for each file.

This patch adds concurrency to these transfers. When a remote snapshot is transferred locally, up to raft_max_get_file_request_concurrency GetFileRequests will be sent concurrently. This defaults to 64.

With this patch, the 100k file snapshot consistently transfers in under 10 seconds on my development machine.

This should resolve https://github.com/baidu/braft/issues/362.

Feb 06 '25 11:02 ambroff

I tried using patch to test on our cluster implementation, but I am running into this error on the follower when it tries to load snapshot from the leader. It seems to me like a race condition caused due to concurrency:

W20250219 19:51:42.219461 95753 snapshot.cpp:809] Fail to copy, error_code 22 error_msg [E22][127.0.1.1:7107][E22]Fail to read from path=/tmp/node2/state/snapshot/snapshot_00000000000000001031 filename=db_snapshot/OPTIONS-000007 : Invalid argument writer path /tmp/node3/state/snapshot/temp
I20250219 19:51:42.722414 95755 node.cpp:2625] node default_group:127.0.0.1:8107:8108 received InstallSnapshotRequest last_included_log_index=1031 last_include_log_term=5 from 127.0.0.1:7107:7108 when last_log_id=(index=0,term=0)
W20250219 19:51:42.723734 95753 snapshot.cpp:232] Snapshot file exist but meta not found so delete it, path: /tmp/node3/state/snapshot/temp/db_snapshot

This happens only when there are atleast 50+ files in the snapshot. When there are fewer files, it works.

Feb 19 '25 16:02 kishorenc

I tried using patch to test on our cluster implementation, but I am running into this error on the follower when it tries to load snapshot from the leader. It seems to me like a race condition caused due to concurrency:
W20250219 19:51:42.219461 95753 snapshot.cpp:809] Fail to copy, error_code 22 error_msg [E22][127.0.1.1:7107][E22]Fail to read from path=/tmp/node2/state/snapshot/snapshot_00000000000000001031 filename=db_snapshot/OPTIONS-000007 : Invalid argument writer path /tmp/node3/state/snapshot/temp
I20250219 19:51:42.722414 95755 node.cpp:2625] node default_group:127.0.0.1:8107:8108 received InstallSnapshotRequest last_included_log_index=1031 last_include_log_term=5 from 127.0.0.1:7107:7108 when last_log_id=(index=0,term=0)
W20250219 19:51:42.723734 95753 snapshot.cpp:232] Snapshot file exist but meta not found so delete it, path: /tmp/node3/state/snapshot/temp/db_snapshot
This happens only when there are atleast 50+ files in the snapshot. When there are fewer files, it works.

This is interesting, thanks for trying it. I have not seen that same behavior, but the fork of braft which I am using in production is actually pretty far behind master of this repository, so some unrelated change I don't have in my tree may be causing a problem. I will pull in all changes from this repo and retest some time this week.

Feb 19 '25 19:02 ambroff

I tried using patch to test on our cluster implementation, but I am running into this error on the follower when it tries to load snapshot from the leader. It seems to me like a race condition caused due to concurrency:
W20250219 19:51:42.219461 95753 snapshot.cpp:809] Fail to copy, error_code 22 error_msg [E22][127.0.1.1:7107][E22]Fail to read from path=/tmp/node2/state/snapshot/snapshot_00000000000000001031 filename=db_snapshot/OPTIONS-000007 : Invalid argument writer path /tmp/node3/state/snapshot/temp
I20250219 19:51:42.722414 95755 node.cpp:2625] node default_group:127.0.0.1:8107:8108 received InstallSnapshotRequest last_included_log_index=1031 last_include_log_term=5 from 127.0.0.1:7107:7108 when last_log_id=(index=0,term=0)
W20250219 19:51:42.723734 95753 snapshot.cpp:232] Snapshot file exist but meta not found so delete it, path: /tmp/node3/state/snapshot/temp/db_snapshot
This happens only when there are atleast 50+ files in the snapshot. When there are fewer files, it works.
This is interesting, thanks for trying it. I have not seen that same behavior, but the fork of braft which I am using in production is actually pretty far behind master of this repository, so some unrelated change I don't have in my tree may be causing a problem. I will pull in all changes from this repo and retest some time this week.

I"m not sure why, but I can not reproduce the same issue in my application even after pulling in all changes from master of this repository up to ab0017f0b98d429138d83a04d3ed351197d671a9. A snapshot with more than 100k files transfers just fine.

Which release or changeset of braft are you using? Which brpc version/changeset?

Feb 25 '25 04:02 ambroff

I updated my project to the latest SHA (ab0017f0b98d429138d83a04d3ed351197d671a9) and I still face the same issue. I have about just 150 files in the snapshot directory. On x86 Linux.

W20250225 16:36:57.618615 718456 snapshot.cpp:815] Fail to copy, error_code 22 error_msg [E22][127.0.1.1:6107][E22]Fail to read from path=/tmp/next-node-data-1/state/snapshot/snapshot_00000000000000000175 filename=db_snapshot/001307.sst : Invalid argument writer path /tmp/next-node-data-3/state/snapshot/temp

From the error log above, it seems like there could a race condition with the temp file path: does the follower write to a temp file and move it to the actual file name? If so, how does this logic handle multiple concurrent downloads at the follower end?

Feb 25 '25 11:02 kishorenc

I changed raft_max_get_file_request_concurrency to 1 in the patch, and it works, so there is definitely some concurrency issue on the follower.

Feb 25 '25 12:02 kishorenc

I changed raft_max_get_file_request_concurrency to 1 in the patch, and it works, so there is definitely some concurrency issue on the follower.

This issue occurs because the latest version of the code limits the file server to reading only one file at a time: https://github.com/baidu/braft/blob/master/src/braft/file_reader.cpp#L64

May 22 '25 13:05 gitccl