netlink icon indicating copy to clipboard operation
netlink copied to clipboard

Wrong sender portid 3034, expected 0

Open DreamerKMP opened this issue 3 years ago • 5 comments

Hi!

I recently ran into a problem when calling the netlink.NeighSubscribeWithOptions() and netlink.LinkSubscribeWithOptions() functions from multiple go-routines.

The code causing the problem is below.

if from.Pid != nl.PidKernel {
  if cberr != nil {
    cberr(fmt.Errorf("Wrong sender portid %d, expected %d", from.Pid, nl.PidKernel))
  }
  continue
}

Actually I checked the message through strace.

[pid  3039] <... recvfrom resumed>{{len=32, type=RTM_GETNEIGH, flags=NLM_F_REQUEST|NLM_F_DUMP, seq=13, pid=0}, {ifi_family=AF_BRIDGE, ifi_type=ARPHRD_NETROM, ifi_index=0, ifi_flags=0, ifi_change=0}}, 65536, 0, {sa_family=AF_NETLINK, nl_pid=3034, nl_groups=0x000004}, [112->12]) = 32
...
[pid  3039] write(1, "{\"Target Network Interface\":\"tes"..., 184{"Target Network Interface":"testeth0","error":"Wrong sender portid 3034, expected 0","level":"error","msg":"NeighSubscribeWithOptions error found","time":"2022-07-21T16:57:51+09:00"}

If you look at the message, you can see that nl_pid has a different value (id of another thread) than the value of PidKernel. So the preceding code caused the error.

I think that code is unnecessary. Please review this.

DreamerKMP avatar Jul 21 '22 08:07 DreamerKMP

I can reproduce the issue in my code using netlink.RouteSubscribeWithOptions()

stv0g avatar Jun 23 '23 17:06 stv0g

@DreamerKMP @stv0g please feel free to open a pull request with your proposed fix

aboch avatar Jun 23 '23 19:06 aboch

The netlink(7) man-page describes the purpose of nl_pid as follows:

nl_pid is the unicast address of netlink socket. It's always 0 if the destination is in the kernel. For a user-space process, nl_pid is usually the PID of the process owning the destination socket. However, nl_pid identifies a netlink socket, not a process. If a process owns several netlink sockets, then nl_pid can be equal to the process ID only for at most one socket. There are two ways to assign nl_pid to a netlink socket. If the application sets nl_pid before calling bind(2), then it is up to the application to make sure that nl_pid is unique. If the application sets it to 0, the kernel takes care of assigning it. The kernel assigns the process ID to the first netlink socket the process opens and assigns a unique nl_pid to every netlink socket that the process subsequently creates.

stv0g avatar Jul 15 '23 08:07 stv0g

The check occurs at several places:

  • https://github.com/vishvananda/netlink/blob/b4489369ddadad1cee455910da26e70b7073fb7b/neigh_linux.go#L406
  • https://github.com/vishvananda/netlink/blob/b4489369ddadad1cee455910da26e70b7073fb7b/xfrm_monitor_linux.go#L62
  • https://github.com/vishvananda/netlink/blob/b4489369ddadad1cee455910da26e70b7073fb7b/proc_event_linux.go#L154
  • https://github.com/vishvananda/netlink/blob/b4489369ddadad1cee455910da26e70b7073fb7b/socket_linux.go#L159
  • https://github.com/vishvananda/netlink/blob/b4489369ddadad1cee455910da26e70b7073fb7b/addr_linux.go#L372

stv0g avatar Jul 15 '23 08:07 stv0g