toolhive icon indicating copy to clipboard operation
toolhive copied to clipboard

Helm does not upgrade CRDs

Open danbarr opened this issue 3 months ago • 1 comments

Problem

Helm does not upgrade CRDs during helm upgrade operations. This is a known Helm 3 limitation to prevent accidental data loss. The crds/ directory is special-cased and only processed during initial installation.

Current Behavior

Our documentation instructs users to upgrade CRDs with:

helm upgrade -i toolhive-operator-crds oci://ghcr.io/stacklok/toolhive/toolhive-operator-crds

What actually happens:

  • Helm reports success
  • New chart revision is created
  • CRDs remain unchanged (even if CRD definitions in the chart have been updated)

Impact

  1. End Users: Users following our documentation have stale CRDs and don't receive CRD updates
  2. Development Workflow: task operator-install-crds doesn't actually update CRDs after the first run
  3. Recent Changes: The default proxy mode change in #2211 won't reach users who upgrade via Helm
  4. Silent Failures: Users have no indication that their CRDs are out of date
  5. User Trust: This has been our documented approach all along

Proposed Solutions

Option 1: Documentation only

Keep the Helm chart but document that upgrades require kubectl apply. This is the state we're in now, and stacklok/docs-website#283 fixes the docs.

User workflow:

# Install CRDs first time only (Helm)
helm install toolhive-operator-crds oci://ghcr.io/stacklok/toolhive/toolhive-operator-crds

# Upgrade CRDs (kubectl) - 
kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/toolhive-operator-crds-0.0.52/deploy/charts/operator-crds/crds/toolhive.stacklok.dev_mcpservers.yaml
kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/toolhive-operator-crds-0.0.52/deploy/charts/operator-crds/crds/toolhive.stacklok.dev_mcpgroups.yaml
# ...and so on for each CRD

Disadvantages:

  • Confusing: why Helm for install but kubectl for upgrade?
  • Users will continue to use helm upgrade despite documentation
  • toolhive-operator-crds Helm release will get out of sync with actual CRD versions
  • Gets increasingly cumbersome as we add more CRDs

Option 2: Add single CRDs manifest release asset

Generate a single manifest file (crds.yaml) containing all 8 CRDs and include it as a release asset with toolhive-operator-crds releases.

Advantages:

  • Simple for users: single file to download and apply
  • Works reliably with kubectl apply
  • Standard practice in the ecosystem
  • Easy to automate in release workflow

Disadvantages:

  • Still have the "why Helm for install but kubectl for upgrade?" confusion and likely chart/manifests drift

Optionally, we could consider just dropping Helm for the CRDs if its only value is initial installation, it ends up no easier than a single kubectl apply from a combined manifest.

Implementation:

# Generate combined file
cat deploy/charts/operator-crds/crds/*.yaml > crds.yaml

New user workflow:

# Install/upgrade latest CRDs from stable URL
kubectl apply -f https://github.com/stacklok/toolhive/releases/latest/download/crds.yaml

# Install/upgrade CRDs from specific release
kubectl apply -f https://github.com/stacklok/toolhive/releases/download/toolhive-operator-crds-0.x.x/crds.yaml

Option 3: New Helm approach

Alternately, we could investigate other approaches seen in the ecosystem. For example, cert-manager only distributes a single Helm chart, with crds.enabled (default false) and crds.keep (default true) values, and upgrades do also upgrade the CRDs if you install with crds.enabled=true. But they also give you the option of managing the CRDs separately, and they do provide a single .yaml file for convenience. https://cert-manager.io/docs/installation/upgrade/#crds-managed-using-helm

danbarr avatar Nov 08 '25 00:11 danbarr

To observe/reproduce this:

In a fresh cluster, install an old version of the CRDs chart (0.0.33):

$ helm upgrade -i toolhive-operator-crds oci://ghcr.io/stacklok/toolhive/toolhive-operator-crds --version 0.0.33

# Observe that it doesn't contain the newer McpPort property, only TargetPort:
$ kubectl explain mcpservers.toolhive.stacklok.dev.spec.mcpPort

...
error: field "mcpPort" does not exist

Then, upgrade to the latest (0.0.52 as of now):

$ helm upgrade -i toolhive-operator-crds oci://ghcr.io/stacklok/toolhive/toolhive-operator-crds --version 0.0.52

# Again, McpPort is not present:
$ kubectl explain mcpservers.toolhive.stacklok.dev.spec.mcpPort

error: field "mcpPort" does not exist

# The newer CRDs aren't installed, either:
$ kubectl get crd

NAME                                   CREATED AT
mcpregistries.toolhive.stacklok.dev    2025-11-08T00:59:19Z
mcpservers.toolhive.stacklok.dev       2025-11-08T00:59:19Z
mcptoolconfigs.toolhive.stacklok.dev   2025-11-08T00:59:20Z

But using kubectl to apply:

$ kubectl apply -f https://raw.githubusercontent.com/stacklok/toolhive/refs/tags/toolhive-operator-crds-0.0.52/deploy/charts/operator-crds/crds/toolhive.stacklok.dev_mcpservers.yaml

# Now the CRD has actually been updated:
$ kubectl explain mcpservers.toolhive.stacklok.dev.spec.mcpPort

...
FIELD: mcpPort <integer>

DESCRIPTION:
    McpPort is the port that MCP server listens to

danbarr avatar Nov 08 '25 01:11 danbarr