Does WebRTC need keepalive?
We need to further our understanding of the properties of WebRTC sessions. The specific question of this issue is: Since Raiden messaging via webRTC is non-continuous (unlike audio/video), do we need to add some active keepalive messaging in order to keep ports open?
My preliminary research suggests no, given an up-to-spec RTCPeerConnection implementation: https://hpbn.co/webrtc/#rtcpeerconnection
- [ ] Does our WebRTC implementation have a working keepalive
- [ ] Test WebRTC through NAT
I would like to add an additional concern to this topic. Even though we probably do not need to have some keep alive in order to keep the connection open, there are findings that close the connection on one side while the other doesn't notice.
There can be multiple reasons for it. One I have found so far is Raiden Node crashing.
The LC (or rather the js implementation of webRTC) sometimes causes an unexpected behavior that the channel seems to be open but messages are getting dropped (and CPU load at 100%).
A keep alive message would discover such failures and trigger a new channel creation.
It also seems on first sight, that aiortc does not deal with keepalive, see https://github.com/aiortc/aiortc/issues/225#issuecomment-555752962
Update: The error in the LC library does not seem to happen anymore. Besides that keepalive must be planned together with the LC as a feature although I'm still not sure if this is necessary. I had a recent test where a channel was open for 20 minutes without sending any messages. In other tests, the connection broke (for some reason).
@andrevmatos do you have any opinions on that?
I also do get the consent checks fail from time to time. This was also described in the linked issue. So it might be the case that our implementation, again, deals differently as browser implementations.
I think WebRTC does have keep-alive out-of-the-box. We have not had issues with it, even with connections living through several minutes, although we've seldom seen connectionstate becoming failed after some time, but it seems not related to keepalive and instead some internal connection issue, which got solved by closing and retrying the channel.
Do you remember what the failure looked like? I received ICE consent check failed a couple of times after a while
We didn't get an error out of it. Just the connectionstate change event got emitted when the state becomes failed or something like that, no error, and then we use it to identify it and teardown+retry the connection.