canonical-kubernetes localhost: kubelet cannot check disk space
Greetings,
Conjured Canonical Kubernetes on localhost Ubuntu 16.04.2 with default settings.
I'm here again with another issue, on the works I've noticed that kubelet cannot check disk space, complaining about the zfs binary not found. This is not really critical but that means that heapster is not recording nodes/pods stats.
After installing zfsutils-linux manually on the workers, here's the errors I'm getting:
Feb 18 15:38:00 juju-f96834-9 kubelet[1505]: E0218 15:38:00.530263 1505 kubelet.go:1634] Failed to check if disk space is available for the runtime: failed to get fs info for "runtime": failed to find information for the filesystem labeled "docker-images"
Feb 18 15:38:00 juju-f96834-9 kubelet[1505]: E0218 15:38:00.530288 1505 kubelet.go:1642] Failed to check if disk space is available on the root partition: failed to get fs info for "root": did not find fs info for dir: /var/lib/kubelet
Feb 18 15:38:05 juju-f96834-9 kubelet[1505]: E0218 15:38:05.043160 1505 handler.go:246] HTTP InternalServerError serving /stats/summary: Internal Error: failed RootFsInfo: did not find fs info for dir: /var/lib/kubelet
Feb 18 15:38:08 juju-f96834-9 kubelet[1505]: E0218 15:38:08.915959 1505 fs.go:333] Stat fs failed. Error: exit status 1: "/sbin/zfs zfs get -Hp all lxd/containers/juju-f96834-9" => /dev/zfs and /proc/self/mounts are required.
Feb 18 15:38:08 juju-f96834-9 kubelet[1505]: Try running 'udevadm trigger' and 'mount -t proc proc /proc' as root.
I'm noticing /dev/zfs is not existing on the workers, so I tried adding it:
lxc config device add juju-f96834-9 /dev/zfs unix-block path=/dev/zfs
But back in the container, with strace:
root@juju-f96834-9:~# strace zfs get -Hp all lxd/containers/juju-f96834-9
[snip]
access("/sys/module/zfs", F_OK) = 0
access("/sys/module/zfs", F_OK) = 0
open("/dev/zfs", O_RDWR) = -1 ENXIO (No such device or address)
write(2, "The ZFS modules are not loaded.\n"..., 87The ZFS modules are not loaded.
Try running '/sbin/modprobe zfs' as root to load them.
) = 87
exit_group(1) = ?
+++ exited with 1 +++
I guess it may have something to do with unprivileged containers and the host zfs ? Let me know if you need more informations, thanks again !
So we do modify the lxd profile to allow certain kernel modules in to the containers. Do we just need the zfs modules loaded? I can easily add those in there
Can I do anything to help you debug this issue ?
You can try updating the profile:
lxc profile edit juju-f96834-9
And adding the necessary kernel modules to linux.kernel_modules: if you need access to /proc as well in that same profile make sure raw.lxc looks like:
raw.lxc: |
lxc.aa_profile=unconfined
lxc.mount.auto=proc:rw sys:rw
Let me know how that goes and if it works I can update the profile accordingly
Heres a list of default modules loaded when installing zfsutils-linux:
zfs 2813952 3
zunicode 331776 1 zfs
zcommon 57344 1 zfs
znvpair 90112 2 zfs,zcommon
spl 102400 3 zfs,zcommon,znvpair
zavl 16384 1 zfs
It seems those profile instructions were already there:
$ lxc profile edit juju-conjure-up-canonical-kubernetes-0e8
config:
boot.autostart: "true"
linux.kernel_modules: ip_tables,ip6_tables,netlink_diag,nf_nat,overlay
raw.lxc: |
lxc.aa_profile=unconfined
lxc.mount.auto=proc:rw sys:rw
security.nesting: "true"
security.privileged: "true"
description: ""
devices:
aadisable:
path: /sys/module/nf_conntrack/parameters/hashsize
source: /dev/null
type: disk
aadisable1:
path: /sys/module/apparmor/parameters/enabled
source: /dev/null
type: disk
root:
path: /
pool: lxd
type: disk
name: juju-conjure-up-canonical-kubernetes-0e8
used_by:
- /1.0/containers/juju-f96834-0
- /1.0/containers/juju-f96834-1
- /1.0/containers/juju-f96834-2
- /1.0/containers/juju-f96834-3
- /1.0/containers/juju-f96834-4
- /1.0/containers/juju-f96834-5
- /1.0/containers/juju-f96834-7
- /1.0/containers/juju-f96834-8
- /1.0/containers/juju-f96834-9
Not the zfs modules listed though, can you add those?
Oh, my bad. Here it is:
config:
boot.autostart: "true"
linux.kernel_modules: ip_tables,ip6_tables,netlink_diag,nf_nat,overlay,zfs,zunicode,zcommon,znvpair,spl,zavl
raw.lxc: |
lxc.aa_profile=unconfined
lxc.mount.auto=proc:rw sys:rw
security.nesting: "true"
security.privileged: "true"
description: ""
devices:
aadisable:
path: /sys/module/nf_conntrack/parameters/hashsize
source: /dev/null
type: disk
aadisable1:
path: /sys/module/apparmor/parameters/enabled
source: /dev/null
type: disk
root:
path: /
pool: lxd
type: disk
name: juju-conjure-up-canonical-kubernetes-0e8
used_by:
- /1.0/containers/juju-f96834-0
- /1.0/containers/juju-f96834-1
- /1.0/containers/juju-f96834-2
- /1.0/containers/juju-f96834-3
- /1.0/containers/juju-f96834-4
- /1.0/containers/juju-f96834-5
- /1.0/containers/juju-f96834-7
- /1.0/containers/juju-f96834-8
- /1.0/containers/juju-f96834-9
I've removed the /dev/zfs device I manually added and rebooted, here's syslog:
Feb 27 15:15:13 juju-f96834-9 systemd[1]: Started udev Wait for Complete Device Initialization.
Feb 27 15:15:13 juju-f96834-9 systemd[1]: Failed to reset devices.list on /system.slice/zfs-import-scan.service: Operation not permitted
Feb 27 15:15:13 juju-f96834-9 systemd[1]: Starting Import ZFS pools by device scanning...
Feb 27 15:15:13 juju-f96834-9 systemd[1]: Failed to reset devices.list on /system.slice/systemd-udev-settle.service: Operation not permitted
Feb 27 15:15:13 juju-f96834-9 cloud-init[56]: Cloud-init v. 0.7.8 running 'init-local' at Mon, 27 Feb 2017 15:14:53 +0000. Up 7.0 seconds.
Feb 27 15:15:13 juju-f96834-9 systemd[1]: Started Initial cloud-init job (pre-networking).
Feb 27 15:15:13 juju-f96834-9 systemd[1]: Reached target Network (Pre).
Feb 27 15:15:13 juju-f96834-9 zpool[410]: /dev/zfs and /proc/self/mounts are required.
Feb 27 15:15:13 juju-f96834-9 zpool[410]: Try running 'udevadm trigger' and 'mount -t proc proc /proc' as root.
Feb 27 15:15:13 juju-f96834-9 systemd[1]: zfs-import-scan.service: Main process exited, code=exited, status=1/FAILURE
Feb 27 15:15:13 juju-f96834-9 systemd[1]: Failed to start Import ZFS pools by device scanning.
Feb 27 15:15:13 juju-f96834-9 systemd[1]: zfs-import-scan.service: Unit entered failed state.
Feb 27 15:15:13 juju-f96834-9 systemd[1]: zfs-import-scan.service: Failed with result 'exit-code'.
Feb 27 15:15:13 juju-f96834-9 systemd[1]: Failed to reset devices.list on /system.slice/zfs-mount.service: Operation not permitted
Feb 27 15:15:13 juju-f96834-9 systemd[1]: Starting Mount ZFS filesystems...
Feb 27 15:15:13 juju-f96834-9 systemd[1]: Failed to reset devices.list on /system.slice/cloud-init-local.service: Operation not permitted
Feb 27 15:15:13 juju-f96834-9 systemd[1]: Time has been changed
Feb 27 15:15:13 juju-f96834-9 zfs[419]: /dev/zfs and /proc/self/mounts are required.
Feb 27 15:15:13 juju-f96834-9 zfs[419]: Try running 'udevadm trigger' and 'mount -t proc proc /proc' as root.
Feb 27 15:15:13 juju-f96834-9 systemd[1]: zfs-mount.service: Main process exited, code=exited, status=1/FAILURE
Feb 27 15:15:13 juju-f96834-9 systemd[1]: Failed to start Mount ZFS filesystems.
Feb 27 15:15:13 juju-f96834-9 systemd[1]: Dependency failed for ZFS startup target.
Feb 27 15:15:13 juju-f96834-9 systemd[1]: zfs.target: Job zfs.target/start failed with result 'dependency'.
Feb 27 15:15:13 juju-f96834-9 systemd[1]: zfs-mount.service: Unit entered failed state.
Feb 27 15:15:13 juju-f96834-9 systemd[1]: zfs-mount.service: Failed with result 'exit-code'.
Feb 27 15:15:13 juju-f96834-9 systemd[1]: Reached target Local File Systems.
If I mount the host /dev/zfs (I'm not sure it's even a good idea ?):
Feb 27 15:20:31 juju-f96834-9 systemd[1]: Started udev Wait for Complete Device Initialization.
Feb 27 15:20:31 juju-f96834-9 systemd[1]: Failed to reset devices.list on /system.slice/zfs-import-scan.service: Operation not permitted
Feb 27 15:20:31 juju-f96834-9 systemd[1]: Starting Import ZFS pools by device scanning...
Feb 27 15:20:31 juju-f96834-9 systemd[1]: Failed to reset devices.list on /system.slice/cloud-init-local.service: Operation not permitted
Feb 27 15:20:31 juju-f96834-9 systemd[1]: Failed to reset devices.list on /system.slice/systemd-udev-settle.service: Operation not permitted
Feb 27 15:20:31 juju-f96834-9 zpool[403]: The ZFS modules are not loaded.
Feb 27 15:20:31 juju-f96834-9 zpool[403]: Try running '/sbin/modprobe zfs' as root to load them.
Feb 27 15:20:31 juju-f96834-9 systemd[1]: zfs-import-scan.service: Main process exited, code=exited, status=1/FAILURE
Feb 27 15:20:31 juju-f96834-9 systemd[1]: Failed to start Import ZFS pools by device scanning.
Feb 27 15:20:31 juju-f96834-9 systemd[1]: zfs-import-scan.service: Unit entered failed state.
Feb 27 15:20:31 juju-f96834-9 systemd[1]: zfs-import-scan.service: Failed with result 'exit-code'.
Feb 27 15:20:31 juju-f96834-9 systemd[1]: Failed to reset devices.list on /system.slice/zfs-mount.service: Operation not permitted
Feb 27 15:20:31 juju-f96834-9 systemd[1]: Starting Mount ZFS filesystems...
Feb 27 15:20:31 juju-f96834-9 zfs[404]: The ZFS modules are not loaded.
Feb 27 15:20:31 juju-f96834-9 zfs[404]: Try running '/sbin/modprobe zfs' as root to load them.
Feb 27 15:20:31 juju-f96834-9 systemd[1]: zfs-mount.service: Main process exited, code=exited, status=1/FAILURE
Feb 27 15:20:31 juju-f96834-9 systemd[1]: Failed to start Mount ZFS filesystems.
Feb 27 15:20:31 juju-f96834-9 systemd[1]: Dependency failed for ZFS startup target.
Feb 27 15:20:31 juju-f96834-9 systemd[1]: zfs.target: Job zfs.target/start failed with result 'dependency'.
Feb 27 15:20:31 juju-f96834-9 systemd[1]: zfs-mount.service: Unit entered failed state.
Feb 27 15:20:31 juju-f96834-9 systemd[1]: zfs-mount.service: Failed with result 'exit-code'.
I'm still getting the same errors from kubelet, let me know if it helps !
@stgraber do you know what it takes to get zfs loaded inside the container?
ZFS doesn't support any kind of namespacing, so you absolutely DO NOT want it to work from inside a container.
If /dev/zfs is available with write access inside the container and you tweak things so that the tools work, what you'll see is the HOST view of ZFS. All the mountpoints listed will be the host mount points and any volume creation/removal will affect the host, not the container.
I think a better question here is why does kubelet need the zfs commands to check disk space?
I've been looking into it and arrived at the google/cadvisor project that kubelet use to gather stats:
https://github.com/google/cadvisor/blob/ba33b5a25bfd1a4e627093ef080872cad627e028/fs/fs.go#L322
I will go and raise an issue with them. In the meantime I guess I could use lxd with another storage backend. Thanks you very much for your help 👍
@adrien-f Thanks for the report, let us know if we can be of further help