cloudstack-kubernetes-provider icon indicating copy to clipboard operation
cloudstack-kubernetes-provider copied to clipboard

Fix providerID handling + label sanitizing

Open hrak opened this issue 2 years ago • 7 comments

This PR changes the way the CloudStack Cloud Controller Manager handles kubelet providerID into a more standardized way that is more common around several other CCM's like the Openstack or vSphere one. The changes were needed to make node labels work again.

The providerID is configured by either setting the kubelet command line flag --provider-id (deprecated) or using the providerID setting in kubelet config yaml. The format of the value is <providername>://region/instance-id so in case of CloudStack for a platform without region f.e. : cloudstack:///4e7689bc-99ea-43d8-8c37-5ff511c01665. With the region being parsed from the providerID, this PR also addresses #39

It also implements the two interface methods InstanceShutdownByProviderID and InstanceShutdown.

And it fixes the way node labels are sanitized. The old regex approach would in some cases strip off allowed characters, for example zone name Development-Internal would turn into DevelopmentInternal, although the - is allowed in label values. The new approach converts all chars that are not allowed in a label value to underscores:

Development-Internal -> Development-Internal Small Instance (4 GB / 2 CPU) -> Small_Instance__4_GB___2_CPU

hrak avatar May 05 '23 06:05 hrak

I realize that these changes should probably be reflected in the README. Will address that shortly.

hrak avatar May 05 '23 06:05 hrak

@hrak @vishesh92 it looks this requires some other changes like https://github.com/Leaseweb/cloudstack-kubernetes-provider/commit/323b493f9d35dc0b6c0f6c0910e73043aac1ded6

weizhouapache avatar May 28 '24 12:05 weizhouapache

I am seeing the below errors after applying the changes. I am not sure about the root cause. I launched a CKS cluster and updated the image.

E0606 13:02:31.408425       1 node_lifecycle_controller.go:149] error checking if node k8s-1273-node-18f3588c91c exists: ProviderID "cloudstack://2d35066b-a8e4-4c2a-b400-256db0a7d2cd" didn't match expected format "cloudstack://region/InstanceID"
E0606 13:02:36.515139       1 node_lifecycle_controller.go:149] error checking if node k8s-1273-node-18f3588c91c exists: ProviderID "cloudstack://2d35066b-a8e4-4c2a-b400-256db0a7d2cd" didn't match expected format "cloudstack://region/InstanceID"
E0606 13:02:37.025198       1 node_controller.go:249] Error getting instance metadata for node addresses: ProviderID "" didn't match expected format "cloudstack://region/InstanceID"
E0606 13:02:37.025248       1 node_controller.go:249] Error getting instance metadata for node addresses: ProviderID "" didn't match expected format "cloudstack://region/InstanceID"
E0606 13:02:37.025255       1 node_controller.go:249] Error getting instance metadata for node addresses: ProviderID "" didn't match expected format "cloudstack://region/InstanceID"
E0606 13:02:37.025261       1 node_controller.go:249] Error getting instance metadata for node addresses: ProviderID "" didn't match expected format "cloudstack://region/InstanceID"
E0606 13:02:37.025267       1 node_controller.go:249] Error getting instance metadata for node addresses: ProviderID "" didn't match expected format "cloudstack://region/InstanceID"
E0606 13:02:41.584449       1 node_lifecycle_controller.go:149] error checking if node k8s-1273-node-18f3588c91c exists: ProviderID "cloudstack://2d35066b-a8e4-4c2a-b400-256db0a7d2cd" didn't match expected format "cloudstack://region/InstanceID"
E0606 13:02:46.648644       1 node_lifecycle_controller.go:149] error checking if node k8s-1273-node-18f3588c91c exists: ProviderID "cloudstack://2d35066b-a8e4-4c2a-b400-256db0a7d2cd" didn't match expected format "cloudstack://region/InstanceID"

vishesh92 avatar Jun 06 '24 13:06 vishesh92

I am seeing the below errors after applying the changes. I am not sure about the root cause. I launched a CKS cluster and updated the image.

E0606 13:02:31.408425       1 node_lifecycle_controller.go:149] error checking if node k8s-1273-node-18f3588c91c exists: ProviderID "cloudstack://2d35066b-a8e4-4c2a-b400-256db0a7d2cd" didn't match expected format "cloudstack://region/InstanceID"
E0606 13:02:36.515139       1 node_lifecycle_controller.go:149] error checking if node k8s-1273-node-18f3588c91c exists: ProviderID "cloudstack://2d35066b-a8e4-4c2a-b400-256db0a7d2cd" didn't match expected format "cloudstack://region/InstanceID"
E0606 13:02:37.025198       1 node_controller.go:249] Error getting instance metadata for node addresses: ProviderID "" didn't match expected format "cloudstack://region/InstanceID"
E0606 13:02:37.025248       1 node_controller.go:249] Error getting instance metadata for node addresses: ProviderID "" didn't match expected format "cloudstack://region/InstanceID"
E0606 13:02:37.025255       1 node_controller.go:249] Error getting instance metadata for node addresses: ProviderID "" didn't match expected format "cloudstack://region/InstanceID"
E0606 13:02:37.025261       1 node_controller.go:249] Error getting instance metadata for node addresses: ProviderID "" didn't match expected format "cloudstack://region/InstanceID"
E0606 13:02:37.025267       1 node_controller.go:249] Error getting instance metadata for node addresses: ProviderID "" didn't match expected format "cloudstack://region/InstanceID"
E0606 13:02:41.584449       1 node_lifecycle_controller.go:149] error checking if node k8s-1273-node-18f3588c91c exists: ProviderID "cloudstack://2d35066b-a8e4-4c2a-b400-256db0a7d2cd" didn't match expected format "cloudstack://region/InstanceID"
E0606 13:02:46.648644       1 node_lifecycle_controller.go:149] error checking if node k8s-1273-node-18f3588c91c exists: ProviderID "cloudstack://2d35066b-a8e4-4c2a-b400-256db0a7d2cd" didn't match expected format "cloudstack://region/InstanceID"

@vishesh92 did you change the provider name from external-cloudstack to cloudstack?

weizhouapache avatar Jun 06 '24 14:06 weizhouapache

Yes. I did change that. Before that pod was crashing due to wrong provider name.

vishesh92 avatar Jun 06 '24 14:06 vishesh92

Yes. I did change that. Before that pod was crashing due to wrong provider name.

According to the pr, the providerID should have 3 slashes instead of 2.

cloudstack://2d35066b should be cloudstack:///2d35066b

weizhouapache avatar Jun 06 '24 14:06 weizhouapache

I tested the above PR with kuberentes 1.28. It doesn't seem to be adding the zone or region labels. I assume it's because this needs some more changes in the implementation of InstancesV2 interface. Moving this PR to milestone for next release for now.

vishesh92 avatar Jun 07 '24 10:06 vishesh92