netlink icon indicating copy to clipboard operation
netlink copied to clipboard

Dynamic socket buffer allocation

Open turekt opened this issue 3 years ago • 0 comments

Hi,

the motivation behind this PR has several elements:

  • I have noticed that when nftables returns a big response, the socket read can return ENOBUFS (no buffer space available), this was explained and discussed here: https://github.com/google/nftables/pull/191/files/0d4369aacbd8b10bc86765a69851d0d01a821fd8#r979436375
  • I tried to play around with a fix for this in the netlink library in order to try and receive all of the data after ENOBUFS is returned but this is just not possible (from netlink man page):
Netlink is not a reliable protocol.  It tries its best to deliver a message to its destination(s), but may drop messages when an out-of-memory condition or other error occurs.  For reliable transfer the sender can  request  an
acknowledgement  from  the  receiver  by setting the NLM_F_ACK flag.  An acknowledgment is an NLMSG_ERROR packet with the error field set to 0.  The application must generate acknowledgements for received messages itself.  The
kernel tries to send an NLMSG_ERROR message for every failed packet.  A user process should follow this convention too.

However, reliable transmissions from kernel to user are impossible in any case.  The kernel can't send a netlink message if the socket buffer is full: the message will be dropped and the kernel and the user-space process  will
no longer have the same view of kernel state.  It is up to the application to detect when this happens (via the ENOBUFS error returned by recvmsg(2)) and resynchronize.
  • If ENOBUFS happen, part of the message will be dropped for sure. Therefore, it is not possible to recover from this error but to re-initiate the same message again with a bigger buffer. Additionally, I saw issues #178 and #179 and it seemed to me that both changes can be covered with this PR.

With this in mind, since it was discussed to try and address the ENOBUFS issue in the netlink lib (https://github.com/google/nftables/pull/191/files/0d4369aacbd8b10bc86765a69851d0d01a821fd8#r982856106), I am introducing a PR for discussion which covers:

  • introduces a ReceiveBuffer (ExecuteBuffer) func which receives a BufferAllocationFunc that allocates buffers for the underlying socket, passed by the user (covering issue #178)
  • introduces a default BufferAllocationFunc (or default allocation strategy) which is similar to the previous peek-loop-allocate but in case ENOBUFS happens, it automatically resizes the socket read and write buffers
  • exposed the ReadBuffer and WriteBuffer methods via bufferGetter interface for easier calculation of buffer size by user applications
  • already implemented functions were minimally changed to retain backwards compatibility

In the end, I am not sure if this is the best approach since applications could still catch the ENOBUFS error themselves, resize the read and write buffers with SetReadBuffer or SetWriteBuffer and then resend the message making this PR unnecessary? We can change the PR as per your feedback.

Let me know what you think.

turekt avatar Sep 30 '22 16:09 turekt