swarmkit icon indicating copy to clipboard operation
swarmkit copied to clipboard

Verify network allocation on creating network

Open xinfengliu opened this issue 6 years ago • 3 comments

Current network allocation is run asynchronously with network (object) creation, when a user specifies an overlapped subnet, the network creation succeeds but this network is useless with empty IPAM config.

For example:

docker network create -d overlay --subnet=10.2.0.0/24 testol
docker network create -d overlay --subnet=10.2.0.0/24 testol2

The second command does not fail. testol2 is created with a different ID but useless (IPAM Config is null).

Signed-off-by: Xinfeng Liu [email protected]

- What I did

This PR makes network creation use a channel to wait for the result of network allocation from Allocator before return. Because of this change, some _test.go files are modified.

Update: Found a duplicated line in "make coverage" of direct.mk that caused CI flaky. I removed that duplicated line.

- How I did it Use a per-network channel to wait for the result of network allocation from Allocator. If the network allocation fails, delete the network object that was just created and return an error to the user. So the user cannot create a network with overlapped subnet.

- How to test it Added TestCreateNetworkOverlapIP to network_test.go.

- Description for the changelog

xinfengliu avatar Nov 23 '19 02:11 xinfengliu

Codecov Report

Merging #2914 into master will decrease coverage by 0.16%. The diff coverage is 59.25%.

@@            Coverage Diff             @@
##           master    #2914      +/-   ##
==========================================
- Coverage   61.62%   61.45%   -0.17%     
==========================================
  Files         139      139              
  Lines       22615    22642      +27     
==========================================
- Hits        13936    13915      -21     
- Misses       7194     7249      +55     
+ Partials     1485     1478       -7

codecov[bot] avatar Nov 25 '19 04:11 codecov[bot]

I can't merge this. The design isn't OK.

Creating a Network in swarmkit is like creating a Service or a Secret. The user is providing a desired state, which swarmkit will later attempt to fulfill. This is an operation that cannot fail. Even if the user's desired state is invalid, it's not swarmkit's place to delete that object. This is, admittedly, made difficult by the fact that Networks cannot be updated. However, it is not the case that this operation could never succeed. If the user deletes the first network, then the second network may be valid and pass allocation.

While the behavior here may feel odd, it isn't errant. What kind of problems is this behavior causing in practice?

dperny avatar Sep 25 '20 17:09 dperny

Additionally, waiting for the network allocation to succeed or fail could possibly take a long time, and in the meantime, the request will be hanging open. For example, if the network uses a third-party IPAM driver, and the driver is slow to respond, then we may be stuck waiting.

dperny avatar Sep 25 '20 17:09 dperny