Permission denied creating the data directory
I'm stuck with the following error when trying to create any kind of CockroachDB cluster using the operator:
E240215 20:18:40.885312 1 1@cli/clierror/check.go:35 [-] 1 ERROR: connection lost.
E240215 20:18:40.885312 1 1@cli/clierror/check.go:35 [-] 1 +creating data directory: mkdir /cockroach/cockroach-data/auxiliary: permission denied
ERROR: connection lost.
creating data directory: mkdir /cockroach/cockroach-data/auxiliary: permission denied
Failed running "start"
The cluster manifest might look like this:
apiVersion: crdb.cockroachlabs.com/v1alpha1
kind: CrdbCluster
metadata:
name: primary-crdb
spec:
cockroachDBVersion: v23.1.11
dataStore:
pvc:
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: "1Gi"
storageClassName: primary-nfs
volumeMode: Filesystem
nodes: 3
resources:
limits:
cpu: 2
memory: 2Gi
requests:
cpu: 500m
memory: 2Gi
tlsEnabled: true
The storage class is for csi-driver-nfs and leads to the following directory tree:
$ ls -lahF /<nfs-csi-dir>/*
/<nfs-csi-dir>/pvc-40b518b5-bccc-4610-b804-0bd2175f5eed:
total 18K
drwxrwsr-x 2 root 1000581000 2 Feb 11 16:26 ./
drwxr-xr-x 6 root root 6 Feb 11 17:07 ../
The CockroachDB pod manifest (kubectl get pods primary-crdb-0 --output yaml) has the following security context:
securityContext:
fsGroup: 1000581000
runAsUser: 1000581000
Which explains why the permissions actually don't add up.
For comparison, using this storage setup, it is possible to create a working mount like this:
...
containers:
- name: busybox
image: busybox:1.28
command: [ "sh", "-c", "sleep 1h" ]
volumeMounts:
- name: data
mountPath: "/test"
securityContext:
runAsUser: 2000
runAsGroup: 2000
fsGroup: 2000
volumes:
- name: data
persistentVolumeClaim:
claimName: test
When creating a file (touch /test/file) from inside the container the directory tree looks like this:
$ ls -lahF /<nfs-csi-dir>/*
/<nfs-csi-dir>/pvc-730e175e-af46-4e48-b4e4-5a1dd568307d:
total 19K
drwxrwsr-x 2 root 2000 3 Feb 11 17:14 ./
drwxr-xr-x 6 root root 6 Feb 11 17:07 ../
-rw-rw-r-- 1 2000 2000 0 Feb 11 17:14 file
It works because all owner and group match.
I'm wondering if the operator should specify runAsGroup or if there is something unusual with my setup, and if this should not be necessary at all.
The locations in the code would be the following:
- https://github.com/cockroachdb/cockroach-operator/blob/v2.12.0/pkg/resource/statefulset.go#L208
- https://github.com/cockroachdb/cockroach-operator/blob/v2.12.0/pkg/resource/job.go#L95
Even though I don't have much experience in self-hosting storage for Kubernetes, I would say adding runAsGroup is the right idea and I'm happy to create a PR if wanted.
same issue
We have the same issue, for us it manifests because we can't trigger a backup via something like:
kubectl exec \
--namespace cockroachdb \
--stdin \
--tty \
db-0 \
--container=db \
-- ./cockroach sql \
--certs-dir=/cockroach/cockroach-certs \
--host=localhost
--execute "BACKUP INTO 'nodelocal://1/backups/' as of system time '-10s'"
Which fails with an error like:
ERROR: opening object for writing: creating target local directory "/cockroach/cockroach-data/extern/backups/2025/03/31-132059.48": mkdir /cockroach/cockroach-data/extern/backups/2025: permission denied
Some thoughts on this problem:
The cockroach containers currently run with a user, which is not declared in the container. This makes it unnecessary hard to debug the container, as it is not immediately obvious that it is intended to be run like this. Instead it makes it look like there is an error or process going haywire.
What I would have expected:
- The container being built with an application user
cockroach - That user being used as the standard user of the container
- all files in
/cockroachbeing owned by that user
To be frank, the fact that this is not the case makes it look like you don't know what you are doing, which is quite terrifying.
Same issue on gcp with local-ssd, works with pd-ssd