Wrong sender portid 3034, expected 0
Hi!
I recently ran into a problem when calling the netlink.NeighSubscribeWithOptions() and netlink.LinkSubscribeWithOptions() functions from multiple go-routines.
The code causing the problem is below.
if from.Pid != nl.PidKernel {
if cberr != nil {
cberr(fmt.Errorf("Wrong sender portid %d, expected %d", from.Pid, nl.PidKernel))
}
continue
}
Actually I checked the message through strace.
[pid 3039] <... recvfrom resumed>{{len=32, type=RTM_GETNEIGH, flags=NLM_F_REQUEST|NLM_F_DUMP, seq=13, pid=0}, {ifi_family=AF_BRIDGE, ifi_type=ARPHRD_NETROM, ifi_index=0, ifi_flags=0, ifi_change=0}}, 65536, 0, {sa_family=AF_NETLINK, nl_pid=3034, nl_groups=0x000004}, [112->12]) = 32
...
[pid 3039] write(1, "{\"Target Network Interface\":\"tes"..., 184{"Target Network Interface":"testeth0","error":"Wrong sender portid 3034, expected 0","level":"error","msg":"NeighSubscribeWithOptions error found","time":"2022-07-21T16:57:51+09:00"}
If you look at the message, you can see that nl_pid has a different value (id of another thread) than the value of PidKernel.
So the preceding code caused the error.
I think that code is unnecessary. Please review this.
I can reproduce the issue in my code using netlink.RouteSubscribeWithOptions()
@DreamerKMP @stv0g please feel free to open a pull request with your proposed fix
The netlink(7) man-page describes the purpose of nl_pid as follows:
nl_pidis the unicast address of netlink socket. It's always 0 if the destination is in the kernel. For a user-space process,nl_pidis usually the PID of the process owning the destination socket. However,nl_pididentifies a netlink socket, not a process. If a process owns several netlink sockets, then nl_pid can be equal to the process ID only for at most one socket. There are two ways to assignnl_pidto a netlink socket. If the application setsnl_pidbefore calling bind(2), then it is up to the application to make sure that nl_pid is unique. If the application sets it to 0, the kernel takes care of assigning it. The kernel assigns the process ID to the first netlink socket the process opens and assigns a uniquenl_pidto every netlink socket that the process subsequently creates.
The check occurs at several places:
- https://github.com/vishvananda/netlink/blob/b4489369ddadad1cee455910da26e70b7073fb7b/neigh_linux.go#L406
- https://github.com/vishvananda/netlink/blob/b4489369ddadad1cee455910da26e70b7073fb7b/xfrm_monitor_linux.go#L62
- https://github.com/vishvananda/netlink/blob/b4489369ddadad1cee455910da26e70b7073fb7b/proc_event_linux.go#L154
- https://github.com/vishvananda/netlink/blob/b4489369ddadad1cee455910da26e70b7073fb7b/socket_linux.go#L159
- https://github.com/vishvananda/netlink/blob/b4489369ddadad1cee455910da26e70b7073fb7b/addr_linux.go#L372