charts icon indicating copy to clipboard operation
charts copied to clipboard

cnpg/cluster - spec.externalClusters not an array in cluster.yaml

Open bartekd123 opened this issue 1 month ago • 16 comments

I see that cluster.yaml is just updated, and when I deploy cnpg/cluster I get the following error:

Error: Cluster.postgresql.cnpg.io "dbName" is invalid: [spec.externalClusters: Invalid value: "null": spec.externalClusters in body must be of type array: "null", : Invalid value: "null": some validation rules were not checked because the object was invalid; correct the existing errors to complete validation]

Looking at cluster\templates\cluster.yaml, I see this: {{ include "cluster.externalClusters" . | nindent 2 }}

Should there be a _externalClusters.tpl as well?

Here is my helm command: helm upgrade --install --timeout=600s dbName cnpg/cluster --set cluster.imagePullPolicy=Always --set nameOverride=dbName --set fullnameOverride=dbName --set-string version.postgresql=15 --set cluster.instances=1 --set cluster.enableSuperuserAccess=true --set cluster.superuserSecret=postgresuser-secret

bartekd123 avatar Dec 15 '25 17:12 bartekd123

Hi @bartekd123! I'm Dosu and I’m helping the charts team.

The error happens because the CloudNativePG CRD requires spec.externalClusters to always be an array, but the Helm chart renders it as null when not set, which fails validation (docs). In the chart, externalClusters is rendered using the cluster.externalClusters partial (from _external_clusters.tpl), not directly in cluster.yaml. If you deploy in standalone mode (the default), the template leaves externalClusters empty, which Helm outputs as null instead of an empty array—triggering the error (template source).

There is no _cluster.tpl file; cluster.yaml uses partials like cluster.bootstrap (from _bootstrap.tpl) and cluster.externalClusters (from _external_clusters.tpl) to build the manifest (template structure).

To work around this, you can patch the chart to always render externalClusters as an empty array when not set, or add a minimal recovery/replica config that triggers the array rendering. Until the chart is updated to handle this automatically, this is the most reliable fix (issue discussion).

To reply, just mention @dosu.


How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other

Chat with Dosu Open in Cursor Join Discord Share on X

dosubot[bot] avatar Dec 15 '25 17:12 dosubot[bot]

@dosu I don't think this was an issue until this week, is it because of the change to the cluster.yaml? Is that a permanent solution to modify the CRD at the cluster? Or is this a bug that will be fixed, and its just a temp workaround?

what did you mean by "you can patch the chart to always render externalClusters as an empty array when not set, or add a minimal recovery/replica config that triggers the array rendering"?

Can something be done at the helm install of the DB (adding another --set option to my helm command) or is the only solution at the CRD?

bartekd123 avatar Dec 15 '25 17:12 bartekd123

Yes, this issue started after a recent change to the chart templates—specifically, PR #576 (merged Dec 14, 2025) refactored how externalClusters is rendered in cluster.yaml. In standalone mode, the new template leaves externalClusters empty, which Helm/YAML interprets as null, causing the CRD validation error you’re seeing (details).

Modifying the CRD to accept null is not a permanent or recommended solution. The correct fix is to patch the chart template so that externalClusters is always rendered as an empty array (externalClusters: []) when not set. This keeps the manifest valid and satisfies the CRD schema (template code).

By “patch the chart,” I mean editing charts/cluster/templates/_external_clusters.tpl so that, in standalone mode, it outputs:

externalClusters: []

instead of leaving the key empty. This is a simple template change.

There isn’t a Helm --set workaround for this—externalClusters isn’t exposed as a value you can set directly, and the chart logic only populates it for recovery/replica modes. So, the only reliable workaround right now is to patch the template or post-process the manifest before applying it. Adding a minimal recovery/replica config would trigger array rendering, but that’s not a general solution for standalone clusters.

This is a bug and should be fixed in the chart; the workaround is just temporary until that happens.

To reply, just mention @dosu.


How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other

Chat with Dosu Open in Cursor Join Discord Share on X

dosubot[bot] avatar Dec 15 '25 17:12 dosubot[bot]

Great thank you @dosu

bartekd123 avatar Dec 15 '25 18:12 bartekd123

You're welcome!

To reply, just mention @dosu.


How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other

Chat with Dosu Open in Cursor Join Discord Share on X

dosubot[bot] avatar Dec 15 '25 18:12 dosubot[bot]

@itay-grudev and @phisco could you probably take a look into this since #576 came from you?

jkleinlercher avatar Dec 15 '25 21:12 jkleinlercher

Can you guys provide me with a minimal set of values that reproduce the issue? It somehow squeezed between the tests. With your values I'll both understand the issue and add it to the tests.

itay-grudev avatar Dec 15 '25 21:12 itay-grudev

Sure @itay-grudev , here is what I did. Let me know if you have any questions.

Obviously this first: kubectl apply --server-side -f https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.28/releases/cnpg-1.28.0.yaml

Then this to install the DB: kubectl create secret generic postgresuser-secret --from-literal=username=postgres --from-literal=password=Password1!

helm upgrade --install --timeout=600s dbName cnpg/cluster --set cluster.imagePullPolicy=Always --set nameOverride=dbName --set fullnameOverride=dbName --set-string version.postgresql=15 --set cluster.instances=1 --set cluster.enableSuperuserAccess=true --set cluster.superuserSecret=postgresuser-secret

To note, I tried in powershell, and I think it doesn't have the issue, but if I try with windows CMD I see the issue.

BTW, my workaround is this instead, which uses the older helm chart

helm upgrade --install --timeout=600s dbName https://github.com/cloudnative-pg/charts/releases/download/cluster-v0.4.0/cluster-0.4.0.tgz --set cluster.imagePullPolicy=Always --set nameOverride=dbName --set fullnameOverride=dbName --set-string version.postgresql=15 --set cluster.instances=1 --set cluster.enableSuperuserAccess=true --set cluster.superuserSecret=postgresuser-secret

bartekd123 avatar Dec 15 '25 22:12 bartekd123

Thanks. I'll try to reproduce it now.

itay-grudev avatar Dec 15 '25 22:12 itay-grudev

This is my values example:

  type: postgresql
  mode: standalone
  version:
    postgresql: "16"
  cluster:
    instances: 1
    postgresql:
      parameters:
        max_connections: "100"
    initdb:
      database: backstage
      secret:
        name: cnpg-backstage-secret
    monitoring:
      enabled: true
    superuserSecret: cnpg-superuser-secret
    roles:
    - name: backstage
      ensure: present
      comment: backstage-admin-user
      login: true
      inherit: true
      superuser: true
      createdb: true
      passwordSecret:
        name: cnpg-backstage-secret
    annotations:
      argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true
  backups:
    enabled: false

jkleinlercher avatar Dec 15 '25 22:12 jkleinlercher

I also faced this. A rollback to 0.4.0 for the time being seems fine though.

pschichtel avatar Dec 17 '25 18:12 pschichtel

We're experiencing this issue, also. As others, we rolled back to 0.4.0 until a fix is merged.

mprimeaux avatar Dec 18 '25 16:12 mprimeaux

Same, I don't understand why the error appears in argocd but not via CLI.

So I wrote #760 to totally skip this key in standalone mode

ArnaudTA avatar Dec 18 '25 23:12 ArnaudTA

hi @dosu I had successfully used these charts until today, when I started seeing this error.

[spec.externalClusters: Invalid value: "null": spec.externalClusters in body must be of type array: "null", <nil>: Invalid value: null: some validation rules were not checked because the object was invalid; correct the existing errors to complete validation

is there any reason I'd be running into this problem just now despite having successfully used this chart in the past week?

Jared-Prime avatar Jan 08 '26 00:01 Jared-Prime

This started happening because of a recent change in the chart templates (specifically PR #576, released in chart v0.5.0) that refactored how the externalClusters field is rendered. In standalone mode, the new template leaves externalClusters empty, which Helm/YAML interprets as null instead of an empty array. The CloudNativePG CRD requires externalClusters to always be an array, so this triggers the validation error you’re seeing. This wasn’t an issue in previous chart versions (like 0.4.0), which is why you didn’t see the error before details.

This is a bug in the chart, not intended behavior, and modifying the CRD is not recommended. There’s no Helm --set workaround; the only current options are to patch the template to always render externalClusters: [] in standalone mode or temporarily use chart version 0.4.0 until a fix is released discussion and fix.

To reply, just mention @dosu.


How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other

Chat with Dosu Open in Cursor Join Discord Share on X

dosubot[bot] avatar Jan 08 '26 01:01 dosubot[bot]

I believe Helm v4.04 may be more strict regarding null values, which is why I did not see the error until I upgraded Helmfile to v1.2.3, which upgrades Helm to v4.0.4. It would have been possible to specify externalClusters: [] in the supplied values for the chart had it not been for this particular bug in the template. It's not necessary to downgrade the template; but it may be sufficient to downgrade Helmfile to v1.2.2. I'll edit this message to convey whether that strategy works or not.

Jared-Prime avatar Jan 08 '26 15:01 Jared-Prime