netfox Added diff states (no binary serialization, no static typing) - includes fix for the first-state problem.

All states are sent UNRELIABLE_ORDERED regardless if they are a full state (contains all properties) or a diff state (contains only the properties with new values) Once the client receives any state, he sends a RELIABLE rpc to the server that he has indeed received a full state, and so it can start receiving diff states. Once this RELIABLE rpc is received by the server, all future states sent to that specific peer are diff states, instead of full states. Diff states are states whose property/values included are only those which have changed from the previous tick.

This implementation also solves the diff states problem mentioned in #264 (this is the 2nd way, and I think its the cleanest and most performant)

I have tested it and found no issues.

Solves #157 As for the bandwidth it saves, see forest-brawl example: "Displaceable:mass", "Displaceable:impulse", ":speed" which change only once via powerup, so 3/5 states are skipped with diff states. Assuming the variables were of same size, with diff states, there is 60% less bandwidth.

Requires #277 which is just static typing additions (so this PR is easy to read, instead of having double the size)

Sep 14 '24 13:09 TheYellowArchitect

I really hope the author reads this

Oct 02 '24 20:10 olivatooo

@olivatooo getting there :slightly_smiling_face:

Oct 09 '24 09:10 elementbound

Cleaned up the PR a bit so we can focus on the actual changes. In the future, please keep in mind the following for PRs:

One PR should implement a single feature
The PR shouldn't contain changes not related to the feature it implements
Preferably don't start rewriting unrelated parts of the codebase along different coding conventions, instead start a discussion. In this PR specifically, I mean adding static typing annotations to variables inside function scopes.
Even though Godot modifies it, you can just not commit the Brawler scene :slightly_smiling_face:

Thanks!

Oct 10 '24 20:10 elementbound

Even though Godot modifies it, you can just not commit the Brawler scene

I apologize for this slipping through, it pops up in each PR I make by default, I gotta manually exclude this, and it seems in this PR I missed it. Btw, which godot version do you use to edit netfox? I use 4.1.0

The PR shouldn't contain changes not related to the feature it implements One PR should implement a single feature commit-message: "remove input diff fragment"

This PR still contains the input changes which are unrelated to diff states. I will move them to a different PR. Freely close that input refactor PR.

Btw on the removal of static typing, I gotta say its harder to read this PR without types, because you cant tell if a property variable is PropertyEntry or String

Oct 11 '24 08:10 TheYellowArchitect

I removed the input refactor so this PR is exclusively on state diffs. I found a weird bug though on Godot 4.1 (haven't tested other versions)

It sends state ticks in order (e.g. 45, 46, 47) but sometimes they arrive out of order (e.g. 45, 47, 46) which shouldn't happen because the RPC is "unreliable_ordered". I want to test this with other godot versions because until Godot 4.4 dev3 some RPC bugs are fixed, and this may include this.

Nonetheless, this made me realize that even if Godot doesn't bug, packet loss and packets being shuffled are a possibility, so it's good to cover those edge scenarios. So I made a condition block for this scenario (see the for picked_tick in range(_latest_state_tick, tick))

The above works and has no bugs but there is still that rare jittering bug where you run 0 ping but there is a jittering at some player once in 10 boots or something (which existed before this PR), which I suspect is caused by this "unreliable_ordered" failure. I will open an issue on this once I confirm its root and can reproduce it consistently.

Anyway, I left the debug comments and stuff inside, turning this into a draft for now until I remove the debug comments and look deeper into this bug (e.g. test other godot versions) Despite it being in draft, it works as intended btw, so test freely.

I will open a PR for the input refactor later, gotta do tasks irl will take a few days.

Oct 12 '24 11:10 TheYellowArchitect

netfox-send-state-tick-shuffle Yup, checkedout to master, opened Godot 4.3, this bug existed prior to this PR and will continue existing with this PR. Sends tick 93, 95, 94. Or see the following: netfox-non-linear-state-receiving

So I pause this PR to investigate this bug https://github.com/foxssake/netfox/issues/300

Oct 13 '24 09:10 TheYellowArchitect

Ok so, I have a few big concerns regarding this implementation. The way I understand it:

Host starts sending full states
Client receives full state, ack's
Host starts sending diff states compared to previous state

Let me know if I missed something.

The third point is very problematic imo - client ack's the first full state, but nothing guarantees they'll have full info for the rest of the states. Diffs might get lost, as they're sent unreliably. So once a diff is lost, the whole chain breaks.

I think diff states should include a reference tick number, i.e. saying "this state describes the diff from this given reference tick". This way, clients will only need to have full info for the reference tick and none other. This was discussed under the original issue.

I'd also prefer if full states weren't sent once at the beginning, but at regular intervals, to make sure that even if something goes wrong with the diffs, we still get the occasional full state to course-correct. Similar to how video codecs have keyframes.

Lastly, I think it would benefit the code if either the full-state and diff-state code flows were more visibly separated ( e.g. different RPC's to handle them ), or collapsed into a single flow, with as little branching as possible. I feel the current version has way too many checks to comfortably manage the two flows in one.

Oct 15 '24 21:10 elementbound

The way I understand it:

Host starts sending full states

Client receives full state, ack's

Host starts sending diff states compared to previous state

Correct overview :+1:

The third point is very problematic imo - client ack's the first full state, but nothing guarantees they'll have full info for the rest of the states. Diffs might get lost, as they're sent unreliably. So once a diff is lost, the whole chain breaks.

It is true, but it should correct itself once the diff changes again. For example, for position and rotation since they change so often, losing a tick doesn't matter because it will be updated in another diff tick. But to play devil's advocate and support your argument, if the different property is movement_speed in forest-brawl, that happens only once via powerup and expires in like 10 seconds, so that is 10 seconds of a property mismatch.

I think diff states should include a reference tick number, i.e. saying "this state describes the diff from this given reference tick". This way, clients will only need to have full info for the reference tick and none other. This was discussed under the https://github.com/foxssake/netfox/issues/157#issuecomment-1866977539.

The issue above, from what I understand, says that a diff should reference a tick with full state, but there are no full states here once diff states begin.

I'd also prefer if full states weren't sent once at the beginning, but at regular intervals, to make sure that even if something goes wrong with the diffs, we still get the occasional full state to course-correct. Similar to how video codecs have keyframes.

This is one of the 2 solutions to this problem. The other solution being akin to input_redundancy where diff'd properties are sent for the following ticks when they change, so if a property changes at tick 52, it sends the delta diff as expected, but at tick 53 it again sends its property value that it has at tick 53, and the same happens for tick 54. However at tick 55+ it doesn't send the property at all if it hasn't changed. So for the above example of tick 52,53,54, its 3 ticks of let's call it state_redundancy :P

The first solution is a hybrid of full state and diff state, and the second solution is doubling down on the diff state. There are pros and cons for both, for example, the second solution is ideal if you have 200+ dropped items in a game world whose position doesn't change, so they simply don't send their position (or state) and they consume no bandwidth.

For its implementation, I would add a var counter: int either to PropertyEntry or in an Array[int] which detects how many times a property is sent after it has changed. So once it changes, it adds 1 per _record_tick and it does an if check with state_redundancy and if its equal, it doesn't send it again. And ofc it resets to 0 once the property changes value. This would be per-proprety. So position and rotation would send just once or twice per change for example (currently it is just once) and movement_speed would send for example 4 times.

Which solution would you like? I can code both, and either solves the problem.

Lastly, I think it would benefit the code if either the full-state and diff-state code flows were more visibly separated ( e.g. different RPC's to handle them ), or collapsed into a single flow, with as little branching as possible. I feel the current version has way too many checks to comfortably manage the two flows in one.

I will push a commit today or tomorrow, give me a confirmation if you want me to split in 2 RPCs. Personally I don't mind.

Thank you for the detailed code review. I honestly didn't expect it, and I feel kinda guilty that I have paused the commit. Its just 2 codeblocks I have to debug/confirm they work as intended and I have prioritized other tasks irl, I didn't expect such rigorous review.

I will push the above commit since its just 2 codeblocks (ignore it freely) which smoothens the packet loss, then I will do a followup commit which addresses all the code review comments.

Oct 16 '24 10:10 TheYellowArchitect

I pushed, packet loss is fixed/addressed. It is good enough to remove from draft, though I keep the debug comments until everything is determined to be perfect. I addressed most of the coding review comments in these commits.

I will test this with the latest master merges and probably push a commit tomorrow because of that forloop which duplicates states, see that comment. It works, but it may cause jittering so I will test it extensively later to confirm.

@elementbound I request to confirm with the current code, whether to split the _submit_state RPC to _submit_full_state and _submit_diff_state, and also which of the 2 solutions to apply for the packet loss on a diff property (as mentioned in the above comment)

After the above confirmations, I will implement them on this PR.

Oct 16 '24 11:10 TheYellowArchitect

I'd also prefer if full states weren't sent once at the beginning, but at regular intervals, to make sure that even if something goes wrong with the diffs, we still get the occasional full state to course-correct. Similar to how video codecs have keyframes.

Added this, was very easy.

It seems the bug mentioned in the code comment causes problems for this PR. I will investigate it, open an issue to explain it in depth and PR for it. Turning this into a draft until then :face_with_diagonal_mouth:

Oct 18 '24 09:10 TheYellowArchitect

I fixed the above in this PR, because after all it leads to an out of index error with this PR and its hard to explain in a branch where this bug doesn't stick out like main. So, to explain the bug which https://github.com/foxssake/netfox/pull/278/commits/aadd51eefa1b1152b82125ac0766ede5564888c2 fixes:

_latest_state_tick initially gets the value NetworkTime.tick - 1 so NetworkRollback at _tick() can do the following forloop:

# Ask all rewindables to submit their earliest inputs
_resim_from = NetworkTime.tick
before_loop.emit()
	
# from = Earliest input amongst all rewindables
var from = _resim_from

# to = Current tick
var to = NetworkTime.tick
	
# for tick in from .. to:
for tick in range(from, to):

If from aka _resim_from is NetworkTime.tick, then the forloop will not run at all (because from == to, and to is excluded). This is the reason _latest_state_tick was initialized as NetworkTime.tick - 1

So a special use case was needed for the very first rollback tick, a simple if which I added.

Now as to the real problem: _latest_state_tick before this PR always had a value NetworkTime.tick - 1 even though it had 0 _states. _states were empty, yet _latest_state_tick had NetworkTime.tick. _latest_state_tick should get a value only after a state is set. See how it gets a value after a state is added to _states

states[received_tick] = sanitized
_latest_state_tick = received_tick

The variable _latest_state_tick gets a value in _submit_state for non-authority, and in _record_tick for authority. Both after _states gets added a key/value pair.

But the bug was that it should be -1 before any state is set. With this PR, _latest_state_tick starts with -1 and doesn't change until an actual state is added. And on handling the -1 for the very first rollback loop, see the start of this comment.

Oct 18 '24 11:10 TheYellowArchitect

https://github.com/foxssake/netfox/pull/278/commits/5cfebd416835f81987e105ca93c2d9d9ee8c6a12 splits the _submit_state RPC to _submit_full_state and _submit_diff_state. It is indeed cleaner this way. And splitting them up, I caught and fixed 2 mistakes, one being the needless setting of properties from the previous tick, and the other being in the sanitized dictionary as you pointed out in the code review.

This PR is ready for review, I don't have anything to add to it, and I tested it even with 200ms and I noticed nothing different from sending exclusively full states (aka vanilla behaviour)

Oct 18 '24 13:10 TheYellowArchitect

But the bug was that it should be -1 before any state is set. With this PR, _latest_state_tick starts with -1 and doesn't change until an actual state is added.

_latest_state was working as intended, as a way to track where to roll back in case we don't own input for the node. Since the original diff state impl expected gapless state history, I suspect how that managed to broke things. Either way, I still suggest to expect gaps in state history, and instead of always comparing to the previous state, specify which tick is the received state diffed from.

Nov 10 '24 10:11 elementbound

Added some metrics so we can track how much data we save by diff states. Only reports the number of properties, for actual data usage I don't see any other way than serializing stuff into a buffer, which is too much work for a simple metric.

Nov 12 '24 11:11 elementbound