Support for Talos
With the dependency on OS binaries such as mount and mkdir, Trident cannot be used with more sophisticated/progressive operating systems like Talos. Will this dependency be lifted at some point?
Would also be very interested to know if this would be possible at some point.
I am also interested in using Astra Trident on a Kubernetes cluster using Talos. Do you have a solution?
I'm working for a large retail(150+ ONTAP clusters)we'd really like to see Talos support. This keeps Netapp in the running as we vet out the best storage selection in our Kubernetes hybrid cloud environment.
We plan to move everything in that direction.
Yes, this is needed. Shelling out on the nodes is not a good option. Getting rid of this dependency will benefit all Linux distros, not just Talos, as they would need much less tools installed on them.
+1 for Talos support.
+1 for Talos support.
+1 for Talos support.
for what it's worth, I managed to mount a trident share on talos, by using a debian:latest BASE image in the Dockerfile (cf this commit. through that, the basic binaries needed (e.g. mkdir, mount, mount.nfs) to mount NFS shares become accessible. This is not ideal as those binaries probably have some sort of correlation with the host kernel version, but for a workaround it does it.
There are some limitations though:
- NFSv3 with locks are not supported, because the rpc.statd daemon is not supported on Talos. that's all documented in https://github.com/siderolabs/talos/issues/6582
- NFSv3 with the
-o nolockmount option do not work either, I can't explain why. the error message ismount.nfs: Protocol not supported - NFSv4 (i.e. mount option
nfsvers=4) does work 🎉 and it seems I was able to use locks (tested that withflock) across different nodes.
I'm not yet sure if we are "ready" to change all our current workloads to NFSv4, I have to read this netapp article on the topic first, but at least we know that technically it is not fully impossible to mount a trident NFS share on talos.
TLDR; in theory it's possible, but it's tricky and I'm not going to invest more time in this for the time being.
here are my latest findings:
- at some point I thought that the fact the the
nfsv3ornfsv4kernel modules couldn't be found was the reason for theProtocol not founderror, but that didn't help. For reference, mounting/lib/modules(from Talos) on/lib/modules(trident-main container) makes those kernel mods discoverable by tools such asmodinfoetc. - the
Protocol not supportederror disappeared when I copied the/etc/protocolsfile from the kubelet rootfs to the trident-main container (the file was here to be precise:/run/containerd/io.containerd.runtime.v2.task/system/kubelet/rootfs/etc/protocols, thanksstracefor finding that out) - the
nfs-utilsbinaries (which includemount.nfs, andrpc.statd) can be installed as described in this commit and they do work. - NFSv3 with locks can work, provided you have start the
rpcbindandrpc.statddaemons.
All of that being said, we are currently putting our trident exploration on hold, and might get back to this issue later. solving it would require:
- building a system extension with the
rpcbindandrpc.statddaemons, which is not trivial, partly because building those from scratch with themusllibrary requires some adaptations it appears. - starting those daemons in a dedicated pod, (e.g. in a daemonset with hostNetwork), however given how critical those daemons would be w.r.t. to locks, we do not want to adventure ourselves in this direction.
1 is much cleaner than 2, but requires too much development at this stage.
+1 for Talos support.
+1 for Talos support
What protocols do you have an interest in being supported in Talos? This could help with prioritization.
We would use the following protocols in our talos environment:
- NFSv4 (including pNFS)
- NVMe/TCP
+1 for Talos support.
Shelling out is double plus ungood. Why not make use of the golang os.MkdirAll library function for mkdir purposes?
I was able to get things to work on Talos by creating my own customized container image using the following Dockerfile:
ARG TRIDENT_TAG=25.02.1
FROM netapp/trident:${TRIDENT_TAG} as trident
FROM library/debian:latest as debian
RUN <<EOF
set -e # Fail on any error
apt update
apt-get --no-install-recommends install -y netbase nfs-common
EOF
FROM trident as build
COPY --from=debian /usr/bin/mkdir /usr/bin/
COPY --from=trident /bin/mount /bin/umount /sbin/mount.nfs /sbin/mount.nfs4 /usr/bin/
COPY --from=debian /etc/protocols /etc/services /etc/
I provide the following caveats:
- We are only using NFS4.2, not any of the other capabilities of Trident.
- The
PATHof the running trident node pods does not include/bin/and/sbin/, hence the need for theCOPY --from=tridentstatement. - It is necessary to update the helmchart values with a
tridentImage:to pull the customized container image.
Did anyone already try trident on Talos using iSCSI?
- Connection to Netapp works it creates LUNs
- PVC's can be added and correctly add a PV
But as soon as we try to mount the volume to a pod we get: failed to stage volume: multipathd is not running
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 17s default-scheduler Successfully assigned default/pvc-tester to sr-os02
Normal SuccessfulAttachVolume 16s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-5f28c41e-7343-47ad-ba2d-981d295be434"
Warning FailedMount 3s (x4 over 7s) kubelet MountVolume.MountDevice failed for volume "pvc-5f28c41e-7343-47ad-ba2d-981d295be434" : rpc error: code = Internal desc = rpc error: code = Internal desc = failed to stage volume: multipathd is not running