helm: can't upgrade to 0.15.0 in place due to daemonset label selector change
The newest version of the k8s-device-plugin chart seems to have removed support for specifying label selectors for each daemonset. Because these label selectors are now impossible to change (and this field is immutable and thus cannot be changed via the k8s API), this makes an in-place upgrade to v0.15.0 via helm very difficult.
You can use helm template along with yq to observe this change. If you have both of these tools installed, use this one-liner to observe the label selectors for v0.14.5:
helm template nvidia-device-plugin nvdp/nvidia-device-plugin --version 0.14.5 --set gfd.enabled=true | yq e 'select(.kind == "DaemonSet") | select(.metadata.name == "nvidia-device-plugin-gpu-feature-discovery") | .spec.selector.matchLabels'
This results in these label selectors:
app.kubernetes.io/name: gpu-feature-discovery
app.kubernetes.io/instance: nvidia-device-plugin
These are the default label selectors for GFD, but they can be changed via gfd.nameOverride in the values.
However, in v0.15.0, the default label selectors have changed, and there is no way to use helm values to change them back to what they were before:
helm template nvidia-device-plugin nvdp/nvidia-device-plugin --version 0.15.0 --set gfd.enabled=true | yq e 'select(.kind == "DaemonSet") | select(.metadata.name == "nvidia-device-plugin-gpu-feature-discovery") | .spec.selector.matchLabels'
This results in these label selectors:
app.kubernetes.io/name: nvidia-device-plugin
app.kubernetes.io/instance: nvidia-device-plugin
Because these label selectors cannot be changed in v0.15.0 by any helm value, any attempt at an upgrade results in an error that looks like this:
Helm upgrade failed for release kube-system/nvidia-device-plugin with chart [email protected]: cannot patch "nvidia-device-plugin-gpu-feature-discovery" with kind DaemonSet: DaemonSet.apps "nvidia-device-plugin-gpu-feature-discovery" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/instance":"nvidia-device-plugin", "app.kubernetes.io/name":"nvidia-device-plugin"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable
Is this a bug in v0.15.0 of the chart, or am I missing some other way to change these label selectors?
Also, I'm happy to submit a PR to fix this, if this is indeed a bug.
In the last release, we merged the code from GFD into the device plugin repo and deprecated the gpu-feature-discovery repo itself. I believe the change in values is likely an oversight that occurred as part of this merge (or if done on purpose, the implications of it weren't obvious at the time).
/cc @ArangoGutierrez and @elezar for their thoughts on what to do here
@mrparkers As @klueska points out this is a side-effect of the migration and was not intentional. It should be considered a bug especially if it is preventing in-place upgrades.
If you're willing to submit a patch that would address this, that would be great. Please open a PR so that myself and @ArangoGutierrez can review.
This issue is stale because it has been open 90 days with no activity. This issue will be closed in 30 days unless new comments are made or the stale label is removed.
This issue was automatically closed due to inactivity.