test-plans icon indicating copy to clipboard operation
test-plans copied to clipboard

feat: add nim hole punching tests

Open diegomrsantos opened this issue 2 years ago • 15 comments

Adds TCP Hole Punching tests for nim.

closes https://github.com/status-im/nim-libp2p/issues/966

diegomrsantos avatar Nov 06 '23 17:11 diegomrsantos

@thomaseizinger, I'm getting the following error when running the tests:

nim-v1_1_x_nim-v1_1__tcp_-dialer-1           | TRC 2023-11-06 16:37:00.372+00:00 got message                                topics="libp2p yamux" tid=30 h="{Data, {Syn}, streamId: 4, length: 0}"
nim-v1_1_x_nim-v1_1__tcp_-dialer-1           | TRC 2023-11-06 16:37:00.373+00:00 Closing yamux connection                   topics="libp2p yamux" tid=30 error="Peer used our reserved stream id"

After investigating, I found the following spec about stream id: https://github.com/hashicorp/yamux/blob/master/spec.md#streamid-field

But the Rust implementation seems to say something different: https://github.com/libp2p/rust-libp2p/blob/135942d3190634beebdc614c70117ced61c9cd99/muxers/mplex/src/codec.rs#L41

diegomrsantos avatar Nov 06 '23 17:11 diegomrsantos

@thomaseizinger, I'm getting the following error when running the tests:

nim-v1_1_x_nim-v1_1__tcp_-dialer-1           | TRC 2023-11-06 16:37:00.372+00:00 got message                                topics="libp2p yamux" tid=30 h="{Data, {Syn}, streamId: 4, length: 0}"
nim-v1_1_x_nim-v1_1__tcp_-dialer-1           | TRC 2023-11-06 16:37:00.373+00:00 Closing yamux connection                   topics="libp2p yamux" tid=30 error="Peer used our reserved stream id"

After investigating, I found the following spec about stream id: https://github.com/hashicorp/yamux/blob/master/spec.md#streamid-field

FYI: We vendored the yamux spec into the specs repo: https://github.com/libp2p/specs/tree/master/yamux

That test is running nim against nim. The only Rust component in there involved is the relay. Both nodes are clients to the relay, so should use odd IDs. The relay will thus use even IDs. Does that make sense?

But the Rust implementation seems to say something different: https://github.com/libp2p/rust-libp2p/blob/135942d3190634beebdc614c70117ced61c9cd99/muxers/mplex/src/codec.rs#L41

You are referencing the mplex implementation here. The holepunch test suite will always use noise+yamux to secure the relayed connection.

thomaseizinger avatar Nov 06 '23 19:11 thomaseizinger

Thanks for the answer. I thought it could be a problem with the Rust relay. I couldn't find the Rust code implementing the creation of stream ids in Rust. I'll try to reproduce only using a Nim relay.

diegomrsantos avatar Nov 06 '23 20:11 diegomrsantos

It is here: https://github.com/libp2p/rust-yamux

We run this in the interop tests so I am quite surprised there would be an issue here.

thomaseizinger avatar Nov 06 '23 20:11 thomaseizinger

Sorry, it was an issue with our relayv2 and yamux interaction, it has been fixed now.

diegomrsantos avatar Nov 21 '23 15:11 diegomrsantos

Sorry, it was an issue with our relayv2 and yamux interaction, it has been fixed now.

Cool! So the tests are already pulling its weight by finding bugs, nice 😁

There seems to be another issue when rust-libp2p is the dialer. Did you experience that locally too? I haven't looked at it closely yet :)

thomaseizinger avatar Nov 21 '23 20:11 thomaseizinger

Those tests are great, thanks a lot for working on it.

I have a workaround locally that makes all the tests pass, but it didn't seem to be the right fix, maybe it is. I believe the current issue is somewhat related to the previous one. If I understand correctly, the listener (dcutr client) creates an inbound TCP connection, and the dialer (dcutr server) an outbound one. I believe the listener creates an outbound Yamux stream and the dialer is also considering it as outbound. This makes both of them use odd stream IDs, causing the error. This is my current hypothesis that needs to be validated. Would you happen to know if the Rust dcutr server does that? If that's the case, should the stream created by the listener (dcutr client) be outbound or the same as the TCP connection direction?

diegomrsantos avatar Nov 21 '23 21:11 diegomrsantos

The underlying TCP connections to the relay are both outbound from the client's perspective. Both of them dial the relay, right?

For the relayed connections between the two clients, the client with the reservation acts as a listener (i.e. yamux server) and other side as the dialer (i.e. yamux client).

So there are two yamux connections involved on either end (layered on top of each other). The first one (together with noise) secures and multiplexes the connection with the relay, the second one does so for the connection between the two clients.

Note that in theory, the relayed connection on top could also use mplex or tls. I just chose to use yamux and noise again.

thomaseizinger avatar Nov 21 '23 21:11 thomaseizinger

I was talking about the new TCP connection created during dcutr. It doesn't use the relay connection, does it? Sorry for not stating it clearly. I believe the current issue happens during the connection upgrade right after both peers tried a new direct connection, outside the relay one.

cc @lchenut

diegomrsantos avatar Nov 21 '23 21:11 diegomrsantos

I was talking about the new TCP connection created during dcutr. It doesn't use the relay connection, does it?

Ah yes, correct!

The spec says that A acts as the dialer on the new connection whereas B opened the DCUtR stream to initiate the hole-punch. A is also the node that initiated the relayed connection.

thomaseizinger avatar Nov 21 '23 21:11 thomaseizinger

In these hole-punching tests, A is the dialer, and B is the listener. The spec says "For the purpose of all protocols run on top of this TCP connection, A is assumed to be the client and B the server.". It seems to me it doesn't explicitly say who should the TCP client and server, but I believe it is reasonable to assume A is the TCP client (outbound) and B is the TCP server (inboud). It does explicitly say that from the Yamux perspective (a protocol running on top of the TCP connection), A is the Yamux client (outbound) and B is the Yamux server (inbound).

I believe the Rust dialer (A) implemented it correctly, but the Nim listener (B) is incorrectly thinking it is outbound as well.

diegomrsantos avatar Nov 21 '23 22:11 diegomrsantos

I believe the Rust dialer (A) implemented it correctly, but the Nim listener (B) is incorrectly thinking it is outbound as well.

That could be it! In Rust we solve this with a "dial as listener" instruction that overrides, what role we assume for ourselves on the resulting connection. This is essential for hole-punching to work. I am surprised the nim <> nim test works. Why is that one not failing too?

thomaseizinger avatar Nov 21 '23 22:11 thomaseizinger

Green tests 🎉

thomaseizinger avatar Nov 22 '23 21:11 thomaseizinger

I'm still trying to understand what's going on. I'll add it here later. Thanks for your help.

diegomrsantos avatar Nov 22 '23 23:11 diegomrsantos

Hopefully this explains the issue and the fix https://github.com/status-im/nim-libp2p/pull/994

diegomrsantos avatar Nov 30 '23 21:11 diegomrsantos