bottlerocket icon indicating copy to clipboard operation
bottlerocket copied to clipboard

SELinux Policy: system_u:system_r:cachefiles_kernel_t:s0

Open woehrl01 opened this issue 1 year ago • 6 comments

What I'd like: I would like to enable fscache in order to allow caching of NFS files. Currently my issue is that on running cachefilesd I either receive:

About to bind cache
CacheFiles bind failed: errno 13 (Permission denied)

or

About to bind cache
CacheFiles bind failed: errno 22 (Invalid argument)

The last happens if I specify, which should be the correct selinux policy to reference

secctx system_u:system_r:cachefiles_kernel_t:s0

It looks like the policy is missing in bottlerocket os:

[root@admin]# seinfo -t | grep cache
   cache_t

I appreciate if that policy could be added.

Related links:

  • https://bugzilla.redhat.com/show_bug.cgi?id=841425

Any alternatives you've considered:

woehrl01 avatar Jul 08 '24 14:07 woehrl01

Hello, thanks for submitting this feature request!

I've confirmed that Bottlerocket currently does not have this policy:

  • I searched the Core-kit repo selinux-policy package and found no results
  • I verified on a Bottlerocket instance:
    • AMI: bottlerocket-aws-k8s-1.24-x86_64-v1.20.2-536d69d0
    • Steps:
      1. SSM into the instance and enter the admin container
      2. yum install setools-console to install seinfo
[root@admin]# seinfo -t | grep cache
   cache_t

We will discuss within the team if it's viable to add this policy and will get back to you with the decision.

koooosh avatar Jul 25 '24 22:07 koooosh

I've been playing around with this, and I found a few things, but first some clarifications for others that find this issue:

It looks like the policy is missing in bottlerocket os

By default, the cachefilesd package will configure cachefiles_kernel_t as the SELinux context in /etc/cachefilesd.conf. You can skip this by commenting the line as follows:

# secctx system_u:system_r:cachefiles_kernel_t:s0

That will force the process to use the parent's SELinux context. The Bottlerocket SELinux policy is way different than the refpolicy which is what this project assumes is available in the host and therefore attempts to set the "standard" label for cachefilesd. This SELinux context isn't necessary as long as you use the correct SELinux context with the correct privilege, and with this lets move to my findings.

I first loaded the cachefiles kernel module, just as the systemd service for cachefilesd does:

modprobe -qab cachefiles

Then, I deployed a pod with the following spec:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fedora
spec:
  selector:
    matchLabels:
      name: fedora
  template:
    metadata:
      labels:
        name: fedora
    spec:
      containers:
        - name: fedora
          image: fedora
          command: ["sleep", "infinity"]
          securityContext:
            privileged: true
          # These will be accessed by cachefilesd
          volumeMounts:
            - mountPath: /dev/log
              name: journal
              readOnly: false
            - mountPath: /dev/cachefiles
              name: cache
              readOnly: false
      volumes:
        - name: journal
          hostPath:
            path: /dev/log
        - name: cache
          hostPath:
            path: /dev/cachefiles

This got me to pass SELinux problems, but I keep getting this error:

CacheFiles bind failed: errno 95 (Operation not supported)

But no AVC denials, I wonder if we are missing a kernel config to allow this, or if control_t isn't enough and this actually requires more privilege.

arnaldo2792 avatar Aug 06 '24 01:08 arnaldo2792

I got more data on this. The error I got was likely because /var/cache/fscache (the default cache directory) was in the container's filesystem (which is overlayfs). After I used a different directory (/mnt from the host), I got an SELinux denial due to this rule:

https://github.com/bottlerocket-os/bottlerocket-core-kit/blob/0ec249582f6975cd72c292c437db30105ccff51c/packages/selinux-policy/rules.cil#L347

This is because privileged: true sets control_t as the process' label and it isn't trusted_s. I've been trying to make cachefilesd work by emulating privileged (provide all capabilities and run with the unconfined seccomp filter) but I think I'm missing access to all the devices, because now I keep getting:

Unable to open /dev/cachefiles: errno 1 (Operation not permitted)

Sadly, I can't set a flag or a config to grant access to all the devices, my config so far looks like:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fedora
spec:
  selector:
    matchLabels:
      name: fedora
  template:
    metadata:
      labels:
        name: fedora
    spec:
      containers:
        - name: fedora
          image: fedora
          command: [sleep, infinity]
          securityContext:
            seccompProfile:
              type: Unconfined
            capabilities:
              add:
                - CHOWN
                - DAC_OVERRIDE
                - DAC_READ_SEARCH
                - FOWNER
                - FSETID
                - KILL
                - SETGID
                - SETUID
                - SETPCAP
                - LINUX_IMMUTABLE
                - NET_BIND_SERVICE
                - NET_BROADCAST
                - NET_ADMIN
                - NET_RAW
                - IPC_LOCK
                - IPC_OWNER
                - SYS_MODULE
                - SYS_RAWIO
                - SYS_CHROOT
                - SYS_PTRACE
                - SYS_PACCT
                - SYS_ADMIN
                - SYS_BOOT
                - SYS_NICE
                - SYS_RESOURCE
                - SYS_TIME
                - SYS_TTY_CONFIG
                - MKNOD
                - LEASE
                - AUDIT_WRITE
                - AUDIT_CONTROL
                - SETFCAP
                - MAC_OVERRIDE
                - MAC_ADMIN
                - SYSLOG
                - WAKE_ALARM
                - BLOCK_SUSPEND
            seLinuxOptions:
              level: s0
              role: system_r
              type: super_t
              user: system_u
          # These will be accessed by cachefilesd
          volumeMounts:
            - mountPath: /dev/log
              name: journal
              readOnly: false
            - mountPath: /dev/cachefiles
              name: cachefiles
              readOnly: false
            - mountPath: /mnt
              name: cache
              readOnly: false
      volumes:
        - name: journal
          hostPath:
            path: /dev/log
        - name: cachefiles
          hostPath:
            path: /dev/cachefiles
        - name: cache
          hostPath:
            path: /mnt/

arnaldo2792 avatar Dec 04 '24 02:12 arnaldo2792

@arnaldo2792 thank you that's very interesting. So if I understand it right, it's now requiring privileged: true, but this would be block by selinux again?

woehrl01 avatar Dec 15 '24 17:12 woehrl01

Hey @woehrl01 sorry for the very late response. Yes, pods with privileged: true end up with control_t instead of super_t (the highest of privileges in Bottlerocket).

I have another idea, I'll give it a try and come back with more data.

arnaldo2792 avatar May 08 '25 05:05 arnaldo2792

Hey @woehrl01, unfortunately my other idea hit the same SELinux denial. But the root cause of this problem is summarized in this issue:

https://github.com/bottlerocket-os/bottlerocket/issues/3791

We are blocked on having access to all the devices, and there isn't an option to request access to all the devices through pod configurations. There could be a workaround with either a Device Plugin or a DRA plugin, but it's too overkill for what you want to accomplish and ideally the container runtime will just do the right thing to grant the required privilege.

We will keep this open in case people with the same use case hit the problem.

arnaldo2792 avatar May 09 '25 20:05 arnaldo2792