Batch RPCs for synchronization
:sparkles: Description
Godot sends RPCs immediately, so each separate RPC call incurs some overhead in bandwidth. So having less separate RPC calls and sending them off at the end of the tick loop could save some traffic. Header overhead is 20-60 bytes for IPv4 or 40 bytes for IPv6 + 8 bytes for UDP + some for Godot, meaning 28+ to 68+ bytes per RPC
Use case
This would affect most of the synchronization code in netfox transparently, meaning users would see less bandwidth usage by simply upgrading. Games could support a wider target audience by working better on worse connections.
Requirements
- RPC calls are gathered into batches and sent at the end of the tick loop.
- The implementation must be able to support the current synchronization nodes ( RollbackSynchronizer, StateSynchronizer ), and potential future ones ( e.g. RewindableAction - see #394 )
- The implementation must support arbitrary data formats for transfer, so it's not tied to individual synchronizer nodes
Specs
The feature consists of multiple parts to implement a mechanism similar to Godot's current RPC system.
Instead of doing RPC calls directly, the affected nodes would call this system, which batches the calls, and sends them off at the end of the tick loop. On the receiving end, the system would use the received data to figure out which methods to call on which nodes, and with what arguments.
Node IDs
- The synchronizer nodes register their methods with the RPC host
- The RPC host assigns numerical IDs to the registered methods
- The methods must also register their argument count, so netfox can check for mismatching argument counts
- The ID format is 10 bits for owner peer ID, and 14 bits for method ID
- The peer IDs are assigned and broadcast by a dedicated peer ( by default, the host )
- The peer IDs are based on Godot's built-in peer IDs
- The method IDs are based on the Callable's hash if that's deterministic for all peers, otherwise based on a caller-supplied string name during registration ( e.g. hash the node path + method name, re-hash with a slight change on hash collision )
- The registered callable itself can be used to queue a batched RPC
- The callable is translated to its network ID internally
- The ID must be freed by the synchronizer nodes once no longer needed ( e.g. in
_exit_tree())
The above may result in code like this:
# Register a method with 3 arguments
BatchRPC.register(_submit_input, 2)
# Call the RPC
BatchRPC.call(_submit_input, [input_data, tick], UNRELIABLE, target_peer)
# Free the registered ID
BatchRPC.unregister(_submit_input)
Sending and receiving batches
- RPC calls are saved to a queue
- RPC batches are sent at the end of the tick loop
- Manually triggering the send must also be possible ( e.g. by a
flush()call ) - It must be possible to turn off automatic sends
- Manually triggering the send must also be possible ( e.g. by a
Data may be transmitted as a regular RPC, or using custom packets.
RPCs
RPCs would require a node. In this case, the RPC host could be made an autoload. However, to avoid Godot quirks, in this case the class must be referred to using static methods, in which case the autoload turns into an instance reference on the static class. This enables writing regular RPC methods, without having to access the autoload from code. Example:
extends Node
class_name BatchRPC
static var _inst: BatchRPC
func _ready():
_inst = self
@rpc(...)
func _submit_batch(...):
...
Calls would take the form of BatchRPC.register(...).
Custom packets
The RPC host would send byte buffers using send_bytes. To receive those, it must somehow receive a SceneMultiplayer instance and connect to its peer_packet signal. Since users may also want to use this signal, a custom packet header must be used, so both the RPC host and user code can decide which code path should handle each packet. Using one or two bytes for this header is enough, as long as it's documented.
This approach has possibly less overhead.
Distribution
netfox core
Notes
- ❓ Why not just build upon Godot's existing RPC system?
- While RPC calls can be captured, Godot doesn't expose methods to get the network identifier used to decide which node the RPC should be called on. Sending NodePaths over the network wouldtake too much data. This warrants a custom solution until Godot doesn't expose its own netid solution.
- See https://github.com/godotengine/godot-proposals/issues/11044
- ❓ Why include owner peer ID in the netid, instead of using a global pool of IDs?
- This way peers retain the authority of generating IDs for callables owned by them. It is also a design preference for the first implementation of this feature.
- ❓ What about packet sizes and MTU?
- AFAIK should be handled by the underlying transport, e.g. ENet
- Custom RPC implementation can remove warnings about invalid nodes
- e.g. if it's an update for an RBS that doesn't exist yet, it can be buffered
- Custom RPCs also unlock global visibility filters
- e.g. don't send any packets to peer#x
I don't think batching is a good idea for netfox to implement.
Godot / ENET do batch RPCs, this can be confirmed with a packet capture.
It also automatically cache's the NodePaths for the RPC call itself. There might be a small gain to be had with cacheing the RS property keys.
There is a high risk here of making things worse or confusing:
-
netfox will be doing chunking/batching work that the transport will probably undo - ie ENET does its own aggregations, fragmentations and reassembly.
-
the multiplayer api supports a variety of transports - enet, websockets, webrtc, steam peers. Each again will have their own ideas on how to handle the data.
-
you now have the mess of trying to tune how much you want stuffed into packets within time windows
We should at the very least make a small POC test that shows tangible benefits of trying batch.
If there really is an issue to solve here it really feels like it should be done upstream of netfox. netfox is a high level netcode library on top of a high level multiplayer api, it really isn't a great place to be doing low level networking schedules.
Thanks for checking @albertok, I remember @TeTpaAka mentioning that batching is not being done. This part needs more clarity imo. I agree with your points otherwise.
The POC sounds like a good idea to get a feel for how much bandwidth could be saved, if we want to go ahead with this. Especially since property cache is already being done in #159 / #358
Godot / ENET do batch RPCs, this can be confirmed with a packet capture.
My wireshark trace clearly shows seperate packages for each rpc. Can you share your test setup so I can verify your findings? I know that ENet can handle packet aggregation, but godot does not use it, see https://github.com/godotengine/godot/issues/78867
And I would be surprised if it did, since the code flushes after each put_packet which is internally called to transfer rpcs.
https://github.com/godotengine/godot/blob/a77a28c02951def7a7e2c3457259a60260a40abe/modules/enet/enet_multiplayer_peer.cpp#L410
I would imagine packet capturing any game would show it.
Here for instance you can see three rollback synchronizer rpcs, 'motion' is player input direction in my game.
Reading more on UDP packet sizes, 500 bytes is roughly what you want anyway which ENET here seems to be achieving.
While 1500 MTU is standard on a straight ethernet connections, it's lower when being tunnelled - say for example a user is connected via a VPN service.
I would imagine packet capturing any game would show it.
Here for instance you can see three rollback synchronizer rpcs, 'motion' is player input direction in my game.
Reading more on UDP packet sizes, 500 bytes is roughly what you want anyway which ENET here seems to be achieving.
While 1500 MTU is standard on a straight ethernet connections, it's lower when being tunnelled - say for example a user is connected via a VPN service.
I'm pretty sure this is just the input redundancy of netfox. I.e. netfox sends the input of the last (by default 3) frames in case a packet is dropped. See https://github.com/foxssake/netfox/blob/7e7f87ad73f71a52fa3cf3c55c11959b97e2e463/addons/netfox/rollback/rollback-synchronizer.gd#L529-L533
EDIT: show more of the relevant section in the netfox code.
I created a small test program:
extends Control
const IP_ADDRESS := "localhost"
const PORT := 12345
const MAX_CLIENTS := 4
func _ready() -> void:
if OS.get_cmdline_args().find("--server") > 0:
# Create server.
var peer = ENetMultiplayerPeer.new()
peer.create_server(PORT, MAX_CLIENTS)
multiplayer.multiplayer_peer = peer
else:
await get_tree().create_timer(.5).timeout
# Create client.
var peer = ENetMultiplayerPeer.new()
peer.create_client(IP_ADDRESS, PORT)
multiplayer.multiplayer_peer = peer
await multiplayer.connected_to_server
test_rpc.rpc_id(0, "test 0")
test_rpc.rpc_id(0, "test 1")
@rpc("any_peer")
func test_rpc(value: String):
print(value)
The wireshark trace clearly shows the two rpc calls being transmitted in different packages.
I think you're right. At some point netfox switched to sending the entire dictionaries repeatedly instead of just their values which was the intention in the original PR: https://github.com/foxssake/netfox/pull/221#discussion_r1583876376
So the good news is we found some more data savings. Granted, compression probably takes care of that automatically.
Speaking of redundancy, would batching have mitigation's for packet loss?
A lost packet potentially means we lost several messages.
I think you're right. At some point netfox switched to sending the entire dictionaries repeatedly instead of just their values which was the intention in the original PR: #221 (comment)
So the good news is we found some more data savings. Granted, compression probably takes care of that automatically.
This is being worked on, see https://github.com/foxssake/netfox/pull/430 Also, compression can help, but general purpose compression algorithms can never be as good as not sending superfluos data.
Speaking of redundancy, would batching have mitigation's for packet loss?
A lost packet potentially means we lost several messages.
True. But the chance of a packet getting lost increases with higher network load. So, overall the packet loss rate should go down. And, since almost all packets we plan to batch together are off the same tick, a lost package would result in one tick missing instead of part of one tick. With most games, occasional missing ticks should be pretty unnoticeable. And for input, we are already using redundancy in order to mitigate packet loss. Grouping all state into one (or a few) large packet(s) is a lot closer to the norm for netcode, which seems to work for most games today. So, no. There is no plan to implement redundancy for the batching. The state is intentionally send unreliable as individual ticks are not important in the great scheme of things. Sending them reliably would result in hiccups when a packet is retransmitted and sending the state of multiple ticks just bloats the transmitted data as a missing tick can be smoothed over by interpolation.
Reading more on UDP packet sizes, 500 bytes is roughly what you want anyway which ENET here seems to be achieving.
500 bytes seems low for the modern internet architecture. IPv6 requires a minimum of 1280 but most networks can go higher (even for IPv4). And, while godot does not provide packet grouping, it does packet fragmentation (+ the network layers handle their own fragmentation pretty reliably, unless the connection is congested.) I would still try to stay below 1200 and split up the packets ourselves as we can deal with partial packets (Since they contain multiple rpcs) while the ENet layer is agnostic to the data and has to discard the whole packet if one fragment is lost.
Huge +1 for batching support; browsers can only handle a couple hundred packets (frames in websockets) per second, so if you have a dozen players and low network tick rate, everything freezes. Batching is a requirement to fix this