standards icon indicating copy to clipboard operation
standards copied to clipboard

Add standard for provider networks

Open kgube opened this issue 1 year ago • 8 comments

This is a WIP for a standard covering provider networks and public IP allocation. https://github.com/SovereignCloudStack/issues/issues/166 https://github.com/SovereignCloudStack/issues/issues/167

kgube avatar Apr 19 '24 10:04 kgube

We should probably just focus the standard on external provider networks/subnets, because that is ultimately what facilitates public IP networking, as https://github.com/SovereignCloudStack/issues/issues/167 and https://github.com/SovereignCloudStack/issues/issues/166 are calling for.

I have been trying to find a way to meaningfully incorporate the other suggestions from #522 (and the comments), like OVN/L3-router-HA and API-extensions/plugins and port security and policies, but they are either orthogonal to the issues (like the HA-topic), or follow from the provider-network requirements (like port security and, at least partially, the API-extensions), or should be discussed somewhere else more broadly (also the HA topic, and policies, specifically codification of/deviation from default policies).

kgube avatar Apr 19 '24 17:04 kgube

The was a short discussion in the community call today on the question of mandatory internet access vs. air-gapped infrastructure that was raised in the IaaS call yesterday. The feedback was that we also want to support standardization of private and air-gaped clouds, and thus we should not mandate internet-access. We should however find a way to communicate to users whether internet access is available.

I will try to reword the standard to reflect this.

kgube avatar Jun 06 '24 13:06 kgube

I have been trying to extend this standard to also apply to private and air-gaped clouds, and it turns out that this is somewhat difficult. A lot of the decision in the standard follow from requirements and limitations of public IP address allocation, which may not apply in a more isolated cloud.

The approach that I'm currently trying, is to make it more conditional and discuss specific use-cases, like:

  • If the CSP offers provider networks, there SHOULD only be one network available to projects per default, which will be called the default provider network. (For an air-gaped cloud, not having provider networks at all might be a valid choice)
  • If the CSP allows allocation of public IP addresses, they MUST be allocated and routed via the default provider network. (This would cover the current focus of the standard, while still allowing e.g. a private cloud behind a NAT)
  • If the CSP allows users network-access to their servers, this MUST happen via the default provider network. (This would cover the case where users in a private cloud can access their servers via private IPs, e.g. via VPN)

I'm still working out the details though.

An alternative would of course be to scope the standard strictly to clouds with public IPs.

kgube avatar Jun 20 '24 15:06 kgube

Finding a common ground regarding provider networks between public, private, and specifically air-gaped clouds is hard, even with individually scoped rules. There is some overlap, but the requirements are very different.

I have now given up on that and made some changes to the draft to limit the standard to clouds that provide public IPs. This still has some issues, though, like the awkwardness of mandating public IPv6 support when the overall support of public IP addresses is not mandatory.

kgube avatar Jul 25 '24 10:07 kgube

@kgube Thanks for all this impressive work! I see that this is a difficult situation. Please do go into the meetings (Team IaaS) and raise awareness as soon as possible. Please try to get people to go into an in-depth break-out session with you.

mbuechse avatar Jul 25 '24 10:07 mbuechse

@kgube overall it looks good to me, maybe it makes sense, besides neutron-dynamic-routing, to refer also to ovn-bgp-agent. because OVN will perhaps take over this task in the future

matfechner avatar Aug 21 '24 09:08 matfechner

While working on the conformance tests I stumbled on Openstacks network auto-allocation feature: https://docs.openstack.org/neutron/latest/admin/config-auto-allocation.html

I'm not sure how I missed that previously, but I am going to update the standard to conform to the auto-allocation requirements.

This is great, actually, because it will greatly simplify conformance in the presence of multiple provider networks or subnet pools, which is currently discouraged but allowed: we can just mandate the SCS default resources to be marked as exclusive auto-allocation defaults.

kgube avatar Sep 12 '24 11:09 kgube

There haven't been much updates here, even though I have continuously been trying to build compliance tests for this. I have been restructuring the code back and forth, without ever really reaching a runnable state (and I don't like pushing code that is not at least able to run). Some aspects of the standard that are just hard to test, some of the requirements are too loosely defined, some are kinda circular, and some I am just personally unhappy with.

I will try to identify those issues and approach them more systematically:

The scope of the standard.

  • Currently, the requirements are limited to "CSPs [that] offer public IP addresses to projects". However, whether a CSP is supposed to be offering public IP addresses is impossible to determine for the compliance tests.
  • The current wording would also include private clouds that are offering public IPs for some selected projects, and would force them to offer public IPs to all of their projects.
  • Standardized IP allocation would also be useful for private clouds, offering private IPs reachable via VPN, which are currently excluded from the standard. This would also aid the development of compliance tests, as they could be run against a private test environment.

I now think that the better approach is to not generally mandate public IP addresses, but to define external IP addresses as addresses reachable by a user with access to the openstack API, such that e.g. automated tooling can talk directly to both API and VMs. CSPs offering public cloud services can then specifically be required to provide public IPs as external IPs. This will make the standard more useful and allow easier compliance testing, at least for the basic requirements. The requirement for public IP addresses could be enabled by an optional CLI flag or environment variable for the test script.

Support for multiple provider networks and/or multiple shared subnet pools per IP version.

  • The standard does not forbid this (though its not recommended), and I'm not sure we should forbid this. However, compliance-testing would mean potentially trying multiple subnet-pool/provider-network combinations until we find one that meets our requirements.
  • This is mostly a hypothetical scenario, but if the standard allows it, the tests should be able to cover it. To test this scenario, we need to build the tests around this scenario, it is not something we can easily add later.
  • I tried adding a requirement for setting the is_default flags of the network auto-allocation feature on the mandated resources to circumvent this, but when discussing this in last week's IaaS call got the suggestion to leave that as a recommendation until the SCS reference implementation has support for this and providers have gotten a chance to test it.

There may be other parts of the standard that are not yet part of the reference implementation and may need a grace period (IPv6 dynamic routing?). It is much more useful to keep the requirement of the is_default flag in the standard and treat the whole standard as a recommendation until it has been properly verified. This is especially true because all of the requirements for auto-allocation (at least for IPv6) are already mandated by the standard, the only addition really is the flag. So in the interest of getting done with the tests, that is how I will proceed.

Compliance tests are limited to the project perspective.

  • The Standard makes some global requirements, like the single standard provider network available to all projects, but the compliance tests can only check it from the perspective of a single project and don't know if it is actually available to all projects.
  • This is not a big problem, in fact it probably makes no difference to users if their standard provider network is different from the one of other users, as long as it performs the same function. But It might be cleaner if that was reflected in the wording of the standard.

This is not a huge issue and does not actually affect the tests themselves (because this is the only way to do them, in this regard). I will approach this if I have time for it, and only when I'm done with the compliance tests.

Limited feedback on the standard.

  • Not really an issue of the standard itself, but something that has made me very cautious in working on the tests has been the limited feedback from CSPs on the standard.
  • Some of the decisions have been contentious in previous discussions, such as making IPv6 mandatory and IPv4 only recommended. It is not unlikely that parts of the requirements will change later, and the compliance tests with them.

There isn't really much I can do about this. I will try to build the compliance tests somewhat modular, but not more than aids general readability. There is no point in considering potential future test scenarios if we aren't there yet, and refactoring will be easier than preparing, in this case.

kgube avatar Oct 07 '24 13:10 kgube

I've reached out to a set of CSPs. One gave qualified feedback as in: the only limiting factor is the mandatory IPv6 prefix. My suggestion is we merge this now as Draft and advocate for its adoption among the CSPs.

fkr avatar Mar 25 '25 07:03 fkr

More feedback from another CSP: seems solid so far.

fkr avatar Mar 30 '25 20:03 fkr

Classical mistake: revert without sign-off. The revert does not qualify as creative, so I set the DCO to pass. Will now merge.

mbuechse avatar Apr 08 '25 12:04 mbuechse