feat: add nim hole punching tests
Adds TCP Hole Punching tests for nim.
closes https://github.com/status-im/nim-libp2p/issues/966
@thomaseizinger, I'm getting the following error when running the tests:
nim-v1_1_x_nim-v1_1__tcp_-dialer-1 | TRC 2023-11-06 16:37:00.372+00:00 got message topics="libp2p yamux" tid=30 h="{Data, {Syn}, streamId: 4, length: 0}"
nim-v1_1_x_nim-v1_1__tcp_-dialer-1 | TRC 2023-11-06 16:37:00.373+00:00 Closing yamux connection topics="libp2p yamux" tid=30 error="Peer used our reserved stream id"
After investigating, I found the following spec about stream id: https://github.com/hashicorp/yamux/blob/master/spec.md#streamid-field
But the Rust implementation seems to say something different: https://github.com/libp2p/rust-libp2p/blob/135942d3190634beebdc614c70117ced61c9cd99/muxers/mplex/src/codec.rs#L41
@thomaseizinger, I'm getting the following error when running the tests:
nim-v1_1_x_nim-v1_1__tcp_-dialer-1 | TRC 2023-11-06 16:37:00.372+00:00 got message topics="libp2p yamux" tid=30 h="{Data, {Syn}, streamId: 4, length: 0}" nim-v1_1_x_nim-v1_1__tcp_-dialer-1 | TRC 2023-11-06 16:37:00.373+00:00 Closing yamux connection topics="libp2p yamux" tid=30 error="Peer used our reserved stream id"After investigating, I found the following spec about stream id: https://github.com/hashicorp/yamux/blob/master/spec.md#streamid-field
FYI: We vendored the yamux spec into the specs repo: https://github.com/libp2p/specs/tree/master/yamux
That test is running nim against nim. The only Rust component in there involved is the relay. Both nodes are clients to the relay, so should use odd IDs. The relay will thus use even IDs. Does that make sense?
But the Rust implementation seems to say something different: https://github.com/libp2p/rust-libp2p/blob/135942d3190634beebdc614c70117ced61c9cd99/muxers/mplex/src/codec.rs#L41
You are referencing the mplex implementation here. The holepunch test suite will always use noise+yamux to secure the relayed connection.
Thanks for the answer. I thought it could be a problem with the Rust relay. I couldn't find the Rust code implementing the creation of stream ids in Rust. I'll try to reproduce only using a Nim relay.
It is here: https://github.com/libp2p/rust-yamux
We run this in the interop tests so I am quite surprised there would be an issue here.
Sorry, it was an issue with our relayv2 and yamux interaction, it has been fixed now.
Sorry, it was an issue with our relayv2 and yamux interaction, it has been fixed now.
Cool! So the tests are already pulling its weight by finding bugs, nice 😁
There seems to be another issue when rust-libp2p is the dialer. Did you experience that locally too? I haven't looked at it closely yet :)
Those tests are great, thanks a lot for working on it.
I have a workaround locally that makes all the tests pass, but it didn't seem to be the right fix, maybe it is. I believe the current issue is somewhat related to the previous one. If I understand correctly, the listener (dcutr client) creates an inbound TCP connection, and the dialer (dcutr server) an outbound one. I believe the listener creates an outbound Yamux stream and the dialer is also considering it as outbound. This makes both of them use odd stream IDs, causing the error. This is my current hypothesis that needs to be validated. Would you happen to know if the Rust dcutr server does that? If that's the case, should the stream created by the listener (dcutr client) be outbound or the same as the TCP connection direction?
The underlying TCP connections to the relay are both outbound from the client's perspective. Both of them dial the relay, right?
For the relayed connections between the two clients, the client with the reservation acts as a listener (i.e. yamux server) and other side as the dialer (i.e. yamux client).
So there are two yamux connections involved on either end (layered on top of each other). The first one (together with noise) secures and multiplexes the connection with the relay, the second one does so for the connection between the two clients.
Note that in theory, the relayed connection on top could also use mplex or tls. I just chose to use yamux and noise again.
I was talking about the new TCP connection created during dcutr. It doesn't use the relay connection, does it? Sorry for not stating it clearly. I believe the current issue happens during the connection upgrade right after both peers tried a new direct connection, outside the relay one.
cc @lchenut
I was talking about the new TCP connection created during dcutr. It doesn't use the relay connection, does it?
Ah yes, correct!
The spec says that A acts as the dialer on the new connection whereas B opened the DCUtR stream to initiate the hole-punch. A is also the node that initiated the relayed connection.
In these hole-punching tests, A is the dialer, and B is the listener. The spec says "For the purpose of all protocols run on top of this TCP connection, A is assumed to be the client and B the server.". It seems to me it doesn't explicitly say who should the TCP client and server, but I believe it is reasonable to assume A is the TCP client (outbound) and B is the TCP server (inboud). It does explicitly say that from the Yamux perspective (a protocol running on top of the TCP connection), A is the Yamux client (outbound) and B is the Yamux server (inbound).
I believe the Rust dialer (A) implemented it correctly, but the Nim listener (B) is incorrectly thinking it is outbound as well.
I believe the Rust dialer (
A) implemented it correctly, but the Nim listener (B) is incorrectly thinking it is outbound as well.
That could be it! In Rust we solve this with a "dial as listener" instruction that overrides, what role we assume for ourselves on the resulting connection. This is essential for hole-punching to work. I am surprised the nim <> nim test works. Why is that one not failing too?
Green tests 🎉
I'm still trying to understand what's going on. I'll add it here later. Thanks for your help.
Hopefully this explains the issue and the fix https://github.com/status-im/nim-libp2p/pull/994