matchbox icon indicating copy to clipboard operation
matchbox copied to clipboard

Matchbox is broken on Firefox

Open janhohenheim opened this issue 3 years ago • 3 comments

I tried following @johanhelsing's blog post series for extreme-bevy and got some success, however after a few seconds my clients randomly disconnect after a short lag spike, while my Chromium peer tells me the following:

Failed to send: JsValue(InvalidStateError: 
Failed to execute 'send' on 'RTCDataChannel':
RTCDataChannel.readyState is not 'open'

coming from here: https://github.com/johanhelsing/matchbox/blob/3f86e86f3df6c4bff6d7b3f1324177d936119635/matchbox_socket/src/webrtc_socket/wasm/message_loop.rs#L107

After some testing and consulting @gschup on Discord, I found out that this happens every time at least one of the two peers uses Firefox.

I tested the following configurations, all resulting in the error:

  • Firefox versions:
    • stable 100.0
    • nightly 102.0a1
  • setup:
    • two peers on same machine
    • two peers on two machines
  • operating systems
    • Windows 10
    • macOS Monterey
  • games
    • https://github.com/janhohenheim/extreme-bevy
      • My Cargo.toml has all git dependencies pinned to specific revisions, so this should be very reproducible
      • networking.rs
    • https://helsing.studio/box_game/ (live version)
    • https://gschup.github.io/bevy_ggrs_demo/ (live version)

untested:

  • Firefox versions:
    • older than 100.0
  • operating systems
    • linux
  • setup
    • more than two peers

janhohenheim avatar May 14 '22 18:05 janhohenheim

Hi, and thanks for the detailed bug report. I've also seen this happening. I didn't touch anything in the deployments, but suddenly it stopped working for me in FF, so I concluded it was either my network setup, OS or browser that had somehow changed for the worse.

Seeing that others have this issue as well, it makes me suspect that something changed in firefox. Perhaps we do something slightly wrong that has now become a bug, or it might be a regression on Mozilla's part.

I guess one route to debug this would be to try out some older Firefox versions and see if they work.

However, I tried running in FF <-> FF (version 100.0) on Windows 11 again now, and I could not reproduce the issue anymore. The game freezes the tab completely after a few minutes on https://helsing.studio/box_game . On https://gschup.github.io/bevy_ggrs_demo/ on the other hand (which I think is using newer versions of both bevy, matchbox and bevy_ggrs) everything seems to be totally fine, so I'm hoping what I see is a bevy/demo related issue that's been fixed.

There is a media panel developer extension for firefox, https://addons.mozilla.org/en-US/firefox/addon/devtools-media-panel/ , when the disconnections happen, do you see anything useful in there? I assume the console log shows nothing of value?

johanhelsing avatar May 15 '22 08:05 johanhelsing

Indeed, the media panel and console show nothing of value. But I did find out something interesting: I cannot recreate the bug on the live version of my code.
I then tried fixing it locally by recreated the hosted files by also running trunk build --release and wasm-opt, just like in my continuous deployment, but that did not solve the error.
Thinking about what other differences the hosting could have, I thought it could be a problem with me running trunk serve locally, so I tried caddy instead, to no avail. I don't know what other difference my local and remote hosting has. I'd be interested in whether or not the bug is also fixed for you on my live version.

janhohenheim avatar May 15 '22 23:05 janhohenheim

Also seeing this behavior.

  • Windows 10
  • Firefox 103.0
  • Bevy 0.7
  • Matchbox main

Tried to break it down into a relatively minimal example. Includes a debug window showing messages sent/received over the data channel.

Additionally I've hosted a live version here.

Fails on both local and live as long as at least one of the peers is using Firefox, but it doesn't fail consistently.

The thing that's throwing me for a loop, and is consistent with the comment above, is that installing the devtools media panel extension fixes the live version. I absolutely can't reproduce it post extension installation, but when I open two incognito windows (thus disabling the extension), it starts failing again... Even with the extension local continues to fail, however.

Would love to dig in a little more, but I'm a bit out of my depth here.

Piefayth avatar Jul 23 '22 00:07 Piefayth

installing the devtools media panel extension fixes the live version.

I'm seeing the same thing. Started happening for me again after uninstalling the extension... So looks like we have a heisenbug here. Fun times.

At least it's not a regression in matchbox, I get this bug also with the old version. So, I'll not let it block the 0.4 release.

For posterity, today I tested with:

- matchbox: dad63ae9
- bevy 0.8.1
- ggrs 0.9.2
- windows 11
- ff 106.0.1
- matchbox: v0.1.0
- bevy 0.5.0
- ggrs 0.5.0
- windows 11
- ff 106.0.1

Any leads on this issue would be very welcome!

johanhelsing avatar Oct 27 '22 06:10 johanhelsing

When I tested today, I could not reproduce it.

- matchbox: edcac47a (0.5 release candidate)
- bevy 0.9.1
- bevy_ggrs 0.11
- windows 11
- ff 108.0.2

Anyone else still running into this? Would be cool if it's just a ~~ff bug that's been fixed~~ and we can remove the warning from the readme/docs.

Edit: I tested with ff 106.0.1 portable

- matchbox: edcac47a (0.5 release candidate)
- bevy 0.9.1
- bevy_ggrs 0.11
- windows 11
- ff portable 106.0.1

And could not see the issue, so probably not something that's fixed in ff.

johanhelsing avatar Jan 13 '23 08:01 johanhelsing

I ran a git bisect, and found out it was fixed in 3103067. Which is really strange. That one just attaches an event handler that logs ice connection state changes.

I tried further digging into this, and in fact, the bug re-appears if I remove the event handler. The event handler doesn't even need to do anything, it just needs to be added.

In other words, this works:

fn create_rtc_peer_connection(config: &WebRtcSocketConfig) -> RtcPeerConnection {
    #[derive(Serialize)]
    struct IceServerConfig {
        urls: Vec<String>,
        username: String,
        credential: String,
    }

    let mut peer_config = RtcConfiguration::new();
    let ice_server = &config.ice_server;
    let ice_server_config = IceServerConfig {
        urls: ice_server.urls.clone(),
        username: ice_server.username.clone().unwrap_or_default(),
        credential: ice_server.credential.clone().unwrap_or_default(),
    };
    let ice_server_config_list = [ice_server_config];
    peer_config.ice_servers(&serde_wasm_bindgen::to_value(&ice_server_config_list).unwrap());
    let connection = RtcPeerConnection::new_with_configuration(&peer_config).unwrap();

    let oniceconnectionstatechange: Box<dyn FnMut(_)> = Box::new(move |_event: JsValue| {});
    let oniceconnectionstatechange = Closure::wrap(oniceconnectionstatechange);
    connection
        .set_oniceconnectionstatechange(Some(oniceconnectionstatechange.as_ref().unchecked_ref()));

    connection
}

While this doesn't:

fn create_rtc_peer_connection(config: &WebRtcSocketConfig) -> RtcPeerConnection {
    #[derive(Serialize)]
    struct IceServerConfig {
        urls: Vec<String>,
        username: String,
        credential: String,
    }

    let mut peer_config = RtcConfiguration::new();
    let ice_server = &config.ice_server;
    let ice_server_config = IceServerConfig {
        urls: ice_server.urls.clone(),
        username: ice_server.username.clone().unwrap_or_default(),
        credential: ice_server.credential.clone().unwrap_or_default(),
    };
    let ice_server_config_list = [ice_server_config];
    peer_config.ice_servers(&serde_wasm_bindgen::to_value(&ice_server_config_list).unwrap());
    let connection = RtcPeerConnection::new_with_configuration(&peer_config).unwrap();

    // not attaching event handler causes bug in ff

    connection
}

Maybe we finally can get rid of the FF warning :D

johanhelsing avatar Jan 13 '23 09:01 johanhelsing

I had a theory that FF was garbage collecting the connection for some reason, unless the media panel or event handler was added.

I tried confirming this by adding the following code:

    let connection_1 = connection.clone();
    let leak: Box<dyn FnMut()> = Box::new(move || {
        std::hint::black_box(&connection_1);
    });
    Closure::wrap(leak).forget();

However I still see disconnects if I have only that and not the empty event handler.

Even though I don't understand why it fixes it, I'll add a note to the code saying not to remove the event handler. I'll close this issue and remove the ff warning from the readme. We can open a new issue about getting rid of the workaround.

johanhelsing avatar Jan 13 '23 10:01 johanhelsing