Matchbox is broken on Firefox
I tried following @johanhelsing's blog post series for extreme-bevy and got some success, however after a few seconds my clients randomly disconnect after a short lag spike, while my Chromium peer tells me the following:
Failed to send: JsValue(InvalidStateError:
Failed to execute 'send' on 'RTCDataChannel':
RTCDataChannel.readyState is not 'open'
coming from here: https://github.com/johanhelsing/matchbox/blob/3f86e86f3df6c4bff6d7b3f1324177d936119635/matchbox_socket/src/webrtc_socket/wasm/message_loop.rs#L107
After some testing and consulting @gschup on Discord, I found out that this happens every time at least one of the two peers uses Firefox.
I tested the following configurations, all resulting in the error:
-
Firefox versions:
- stable 100.0
- nightly 102.0a1
-
setup:
- two peers on same machine
- two peers on two machines
-
operating systems
- Windows 10
- macOS Monterey
-
games
- https://github.com/janhohenheim/extreme-bevy
- My Cargo.toml has all git dependencies pinned to specific revisions, so this should be very reproducible
- networking.rs
- https://helsing.studio/box_game/ (live version)
- https://gschup.github.io/bevy_ggrs_demo/ (live version)
- https://github.com/janhohenheim/extreme-bevy
untested:
-
Firefox versions:
- older than 100.0
-
operating systems
- linux
-
setup
- more than two peers
Hi, and thanks for the detailed bug report. I've also seen this happening. I didn't touch anything in the deployments, but suddenly it stopped working for me in FF, so I concluded it was either my network setup, OS or browser that had somehow changed for the worse.
Seeing that others have this issue as well, it makes me suspect that something changed in firefox. Perhaps we do something slightly wrong that has now become a bug, or it might be a regression on Mozilla's part.
I guess one route to debug this would be to try out some older Firefox versions and see if they work.
However, I tried running in FF <-> FF (version 100.0) on Windows 11 again now, and I could not reproduce the issue anymore. The game freezes the tab completely after a few minutes on https://helsing.studio/box_game . On https://gschup.github.io/bevy_ggrs_demo/ on the other hand (which I think is using newer versions of both bevy, matchbox and bevy_ggrs) everything seems to be totally fine, so I'm hoping what I see is a bevy/demo related issue that's been fixed.
There is a media panel developer extension for firefox, https://addons.mozilla.org/en-US/firefox/addon/devtools-media-panel/ , when the disconnections happen, do you see anything useful in there? I assume the console log shows nothing of value?
Indeed, the media panel and console show nothing of value. But I did find out something interesting: I cannot recreate the bug on the live version of my code.
I then tried fixing it locally by recreated the hosted files by also running trunk build --release and wasm-opt, just like in my continuous deployment, but that did not solve the error.
Thinking about what other differences the hosting could have, I thought it could be a problem with me running trunk serve locally, so I tried caddy instead, to no avail. I don't know what other difference my local and remote hosting has. I'd be interested in whether or not the bug is also fixed for you on my live version.
Also seeing this behavior.
- Windows 10
- Firefox 103.0
- Bevy 0.7
- Matchbox
main
Tried to break it down into a relatively minimal example. Includes a debug window showing messages sent/received over the data channel.
Additionally I've hosted a live version here.
Fails on both local and live as long as at least one of the peers is using Firefox, but it doesn't fail consistently.
The thing that's throwing me for a loop, and is consistent with the comment above, is that installing the devtools media panel extension fixes the live version. I absolutely can't reproduce it post extension installation, but when I open two incognito windows (thus disabling the extension), it starts failing again... Even with the extension local continues to fail, however.
Would love to dig in a little more, but I'm a bit out of my depth here.
installing the devtools media panel extension fixes the live version.
I'm seeing the same thing. Started happening for me again after uninstalling the extension... So looks like we have a heisenbug here. Fun times.
At least it's not a regression in matchbox, I get this bug also with the old version. So, I'll not let it block the 0.4 release.
For posterity, today I tested with:
- matchbox: dad63ae9
- bevy 0.8.1
- ggrs 0.9.2
- windows 11
- ff 106.0.1
- matchbox: v0.1.0
- bevy 0.5.0
- ggrs 0.5.0
- windows 11
- ff 106.0.1
Any leads on this issue would be very welcome!
When I tested today, I could not reproduce it.
- matchbox: edcac47a (0.5 release candidate)
- bevy 0.9.1
- bevy_ggrs 0.11
- windows 11
- ff 108.0.2
Anyone else still running into this? Would be cool if it's just a ~~ff bug that's been fixed~~ and we can remove the warning from the readme/docs.
Edit: I tested with ff 106.0.1 portable
- matchbox: edcac47a (0.5 release candidate)
- bevy 0.9.1
- bevy_ggrs 0.11
- windows 11
- ff portable 106.0.1
And could not see the issue, so probably not something that's fixed in ff.
I ran a git bisect, and found out it was fixed in 3103067. Which is really strange. That one just attaches an event handler that logs ice connection state changes.
I tried further digging into this, and in fact, the bug re-appears if I remove the event handler. The event handler doesn't even need to do anything, it just needs to be added.
In other words, this works:
fn create_rtc_peer_connection(config: &WebRtcSocketConfig) -> RtcPeerConnection {
#[derive(Serialize)]
struct IceServerConfig {
urls: Vec<String>,
username: String,
credential: String,
}
let mut peer_config = RtcConfiguration::new();
let ice_server = &config.ice_server;
let ice_server_config = IceServerConfig {
urls: ice_server.urls.clone(),
username: ice_server.username.clone().unwrap_or_default(),
credential: ice_server.credential.clone().unwrap_or_default(),
};
let ice_server_config_list = [ice_server_config];
peer_config.ice_servers(&serde_wasm_bindgen::to_value(&ice_server_config_list).unwrap());
let connection = RtcPeerConnection::new_with_configuration(&peer_config).unwrap();
let oniceconnectionstatechange: Box<dyn FnMut(_)> = Box::new(move |_event: JsValue| {});
let oniceconnectionstatechange = Closure::wrap(oniceconnectionstatechange);
connection
.set_oniceconnectionstatechange(Some(oniceconnectionstatechange.as_ref().unchecked_ref()));
connection
}
While this doesn't:
fn create_rtc_peer_connection(config: &WebRtcSocketConfig) -> RtcPeerConnection {
#[derive(Serialize)]
struct IceServerConfig {
urls: Vec<String>,
username: String,
credential: String,
}
let mut peer_config = RtcConfiguration::new();
let ice_server = &config.ice_server;
let ice_server_config = IceServerConfig {
urls: ice_server.urls.clone(),
username: ice_server.username.clone().unwrap_or_default(),
credential: ice_server.credential.clone().unwrap_or_default(),
};
let ice_server_config_list = [ice_server_config];
peer_config.ice_servers(&serde_wasm_bindgen::to_value(&ice_server_config_list).unwrap());
let connection = RtcPeerConnection::new_with_configuration(&peer_config).unwrap();
// not attaching event handler causes bug in ff
connection
}
Maybe we finally can get rid of the FF warning :D
I had a theory that FF was garbage collecting the connection for some reason, unless the media panel or event handler was added.
I tried confirming this by adding the following code:
let connection_1 = connection.clone();
let leak: Box<dyn FnMut()> = Box::new(move || {
std::hint::black_box(&connection_1);
});
Closure::wrap(leak).forget();
However I still see disconnects if I have only that and not the empty event handler.
Even though I don't understand why it fixes it, I'll add a note to the code saying not to remove the event handler. I'll close this issue and remove the ff warning from the readme. We can open a new issue about getting rid of the workaround.