Add support for ZFS encryption
Supersedes #373. Description copied:
Why
As I stated a while ago in #218, I would like clevis to be able to unlock ZFS datasets that have native encryption enabled. This is my attempt at adding this by storing the data in zfs properties.
How
This is achieved by storing the clevis data (output of
clevis-encrypt) in ZFS User Properties. Fromzfsprops(8):User Properties In addition to the standard native properties, ZFS supports arbitrary user properties. User properties have no effect on ZFS behavior, but applications or administrators can use them to annotate datasets (file systems, volumes, and snapshots). User property names must contain a colon (":") character to distinguish them from native properties. They may contain lowercase letters, numbers, and the following punctuation characters: colon (":"), dash ("-"), period ("."), and underscore ("_"). The expected convention is that the property name is divided into two portions such as module:property, but this namespace is not enforced by ZFS. User property names can be at most 256 characters, and cannot begin with a dash ("-"). When making programmatic use of user properties, it is strongly suggested to use a reversed DNS domain name for the module component of property names to reduce the chance that two independently-developed packages use the same property name for different purposes. The values of user properties are arbitrary strings, are always inherited, and are never validated. All of the commands that operate on properties (zfs list, zfs get, zfs set, and so forth) can be used to manipulate both native properties and user properties. Use the zfs inherit command to clear a user property. If the property is not defined in any parent dataset, it is removed entirely. Property values are limited to 8192 bytes.Properties
All clevis user properties are prefixed with
latchset.clevis* one property to check if a dataset is bound: `latchset.clevis:labels`, should be a space-separated list of bound labels. or absent when unbound. * one or more properties to store the clevis data: `latchset.clevis.label:LABEL_NAME[-N]` where `-N` is an integer suffix when the data for label LABEL_NAME is too large for one property.If there are more than 10 properties needed, the integer will be
0-padded to help with sorting to easily combine them when unlocking.As noted above (at the end), the limit of the value of a user property is 8192 bytes. A simple 1-host
tangsetup will probably not go over this limit, but with a more complicated setup withclevis-encrypt-sss, it is possible.Because of that, the clevis data is split in 8k chunks, and saved in multiple user-properties. These are combined upon unlock.
"Works": (works on my machine, needs more testing)
* binding ZFS dataset with `clevis-zfs-bind` * unbinding ZFS dataset with `clevis-zfs-unbind` * testing and unlocking ZFS dataset with `clevis-zfs-unlock` * splitting and combining zfs-properties (tested with a limit of 800 instead of 8000)To Do:
* [ ] manpages * [ ] initramfs hooks * [ ] rebinding support? (like clevis-luks-rebind) * [x] Maybe: multiple "slots" support. Currently only one "slot" is available. Added label support * [ ] clean up commits if this is not squashed
Further work by @lowjoel:
- [x] Cleaned up the error messages, fixed lints reported by shellcheck
- [x] Completed support for initramfs. This requires OpenZFS 2.2+ because of https://github.com/openzfs/zfs/commit/6e015933f88fe7ba5de45cf263028de1ee04460a. Dracut has not been tested after my changes.
- [x] Did a full
clevis-zfs-bind,clevis-zfs-unlock(at reboot), andclevis-zfs-unbind - [x] PPA available for Ubuntu 22.04
- [ ] Integrate clevis-zfs-test with Meson
Once we're happy with the code I can squash the commits to those by @techhazard and myself.
Would there be any instructions, how this can be tested?
I'd like to aid in this implementation, but am unsure how.
Is there something that can be done by third-parties?
Would there be any instructions, how this can be tested?
I've packaged my changes for Ubuntu in a PPA. After you install that, it's essentially just using clevis-zfs-bind with the same Clevis configuration parameters. The package already triggers initramfs to be regenerated automatically.
I'd like to aid in this implementation, but am unsure how.
Testing would be good. But I only have this specific initramfs configuration.
Thank you for the quick reply. I have found the PPA. Unfortunately I have never used Clevis and cannot unpack the sentence
just using
clevis-zfs-bindwith the same Clevis configuration parameters
to infer what a test setup would look like.
I'm able to set up an LXD/Incus virtual network that contains an Ubuntu VM/system container and a block devices that contains an encrypted ZFS pool plus a separate instance for Clevis.
As a tester I would ask myself:
- Which are the steps to set up the Clevis server to provide valid ZFS decryption keys?
- Which are the steps to set up the Ubuntu client to pull the encryption keys in initramfs?
I am having the intuition that bind, unlock and unbind are the hooks that make this work, but I am left asking myself how.
Would it seem suitable to add some documentation with this cycle? The information could go to the clevis.1.adoc man page, a new one like clevis-zfs.1.doc or the existing clevis-decrypt.1.adoc, plus into the README.md or INSTALL.md. This would allow to assure that the presence of the feature is also reflected in documentation.
to infer what a test setup would look like.
Ah, OK. Have a look at the Arch wiki. In my case, I want to bind the key with both my TPM and with a Yubikey, so I used clevis-zfs-bind -d rpool -l boot sss '{"t": 2, "pins": {"tpm2": {"pcr_ids": "0"}, "yubikey": {"challenge_size": 64, "salt_size": 64}}}'.
* Which are the steps to set up the Clevis server to provide valid ZFS decryption keys?
If you want a server, I think you're referring to Tang. I haven't set that up; my key unlock is purely held on the TPM and with a Yubikey, there's no network involved in my setup.
* Which are the steps to set up the Ubuntu client to pull the encryption keys in initramfs?I am having the intuition that
bind,unlockandunbindare the hooks that make this work, but I am left asking myself how.
Essentially Clevis does most of the heavy lifting. All this PR does is teach Clevis how to unlock ZFS datasets, which is why there isn't significant documentation. The two arguments:
-
-dtells Clevis which dataset you want to unlock. In my case, rpool (ZFS on Root) -
-ltells Clevis where to store the actual ciphertext containing the key to unlock the ZFS dataset. I used the labelbootwhich you can verify usingzfs get latchset.clevis.label:boot rpool
clevis-zfs-unlock is called at initramfs to essentially scan for labels and try unlocking the datasets. clevis-zfs-unbind just removes the labels from the dataset.
Would it seem suitable to add some documentation with this cycle? The information could go to the
clevis.1.adocman page, a new one likeclevis-zfs.1.docor the existingclevis-decrypt.1.adoc, plus into the README.md or INSTALL.md. This would allow to assure that the presence of the feature is also reflected in documentation.
Calling clevis-zfs-bind --help also prints that documentation but I should figure that out. I'd like some confirmation that it works for someone other than me before I go write it 😅
I'm on it, but give a few months. Thanks for the link and for helping me understand the relationship between Tang and Clevis better.
this would be a huge help for our organization 👍
@lowjoel any help you need to wrap this pr up ?
More testing and code review is good. I've been using it for half a year now, but works well for me
@lowjoel As you are requesting more testing I will be playing around with this shortly. Not an Ubuntu user but Debian. My plain is PVE on encrypted ZFS mirror. I have taken a look at your .debs and other than your packages the requirements and versions are the same as on Debian so I should have no problem using them. As there will be no LUKS on the intended test system I should be alright with only your clevis, clevis-zfs, and clevis-initramfs-zfs packages with versions 21-1~202409290907~ubuntu24.04.1. Do you envisage I would need more of your packages? My planned steps would be along the lines of:
# apt install libjose0 cracklib-runtime jose libcrack2
# dpkg -i clevis_21-1~202409290907~ubuntu24.04.1_amd64.deb clevis-zfs_21-1~202409290907~ubuntu24.04.1_amd64.deb clevis-initramfs-zfs_21-1~202409290907~ubuntu24.04.1_amd64.deb
# clevis-zfs-bind -d rpool -l boot tang '{"url": "http://tang.server.ip/"}'
@deatharse that looks about right! The changes here are written in shell so you probably could get away with just installing your distro clevis, and then just manually install using dpkg clevis-zfs and clevis-initramfs-zfs
Thanks, I will let you know how I get on as it would be really nice to get this merged.
you probably could get away with just installing your distro clevis, and then just manually install using dpkg
clevis-zfsandclevis-initramfs-zfs
Not so as clevis-zfs has a hard dependency on your version
Depends: clevis (= 21-1~202409290907~ubuntu24.04.1), zfsutils-linux
Which isnt a problem as the Debian version hasn't been updated to 21 yet anyway.
@lowjoel So I've had a play. Much like my earlier suggested plan I:
- installed PVE on a single disk with ZFS and converted it to an encrypted mirror
- Installed your packages:
- clevis_21-1~202409290907~ubuntu24.04.1_amd64.deb
- clevis-initramfs-zfs_21-1~202409290907~ubuntu24.04.1_amd64.deb
- clevis-zfs_21-1~202409290907~ubuntu24.04.1_amd64.deb
- ran
clevis-zfs-bind -d rpool -l boot01 tang '{"url": "http://tang.server.ip/"}' - Accepted the key
- ran
update-initramfs -u -k all - rebooted
Now I'm not sure if the following is due to PVE or not but upon reboot i recieved:
Error communicating with server http://tang.server.ip/ Key load error: encryption failure Enter passphrase for 'rpool':
Okay, so I configured initramfs by adding a file /etc/initramfs-tools/conf.d/ip.conf with contents in the form:
IP={{ ip }}::{{ gateway }}:{{ netmask }}:{{ hostname }}:{{ interface }}:off
Updated initramfs again and rebooted.
Same thing occurred, however after entering the passphrase I saw the network configured output so it seemed like it was processing out-of-order. I decided to install and configure dropbear using only the minimal required packages:
- dropbear-bin
- dropbear-initramfs
- libtomcrypt1
- libtommath1
Configured dropbear, updated initramfs again, and rebooted. Success, unlocked using the tang server.
I know its advisable but I do not remember needing the additional steps with LUKS, but was likely just Debian. I will have a further play later by setting up Debian with ZFS on root.
I am not an "official reviewer" for the project but one thing i would suggest is in keeping consistency with clevis-luks-bind, your clevis-zfs-bind should implement the -y parameter to "Automatically answer yes for all questions". I know this is not necessary from an unattended point of view as you pass the fingerprint as part of the tang config json however for the sake of consistency I would implement it.
I hope this is helpful.
@lowjoel Another consistency point @sarroutbi or @sergio-correia would probably pick up on as a nit-pick is the needless use of the function keyword in your bash scripts,
@lowjoel one bug? I have noticed is if you pass the -f flag "Do not prompt when overwriting configuration" and there is no current config (i.e. the label does not currently exist) the clevis zfs bind command will exit with a status code of 1 and the message:
/usr/libexec/clevis-zfs-common: line 135: new_labels: unbound variable
it seems counter intuitive to me that this flag would cause it to fail in this circumstance.
From further investigation the need for properly configured dropbear to bring up the network is due to clevis-initramfs-zfs_21-1~202409290907~ubuntu24.04.1_amd64.deb missing those capabilities. The normal clevis-initramfs has the files:
-
/usr/share/initramfs-tools/scripts/local-top/clevis -
/usr/share/initramfs-tools/scripts/local-bottom/clevis
which from the file /usr/share/initramfs-tools/scripts/local-top/clevis the main clevisloop() invokes do_configure_networking() via
155 if [ $netcfg_attempted -eq 0 ] && has_tang_pin ${CRYPTTAB_SOURCE}; then
156 netcfg_attempted=1
157 do_configure_networking
158 fi
where there is obvious LUKS dependencies
108 has_tang_pin() {
109 local dev="$1"
110
111 clevis luks list -d "${dev}" | grep -q tang
112 }
Replicating both of these files and modifying the bottom of /usr/share/initramfs-tools/scripts/local-top/clevis to be:
- clevisloop &
- echo $! >/run/clevis.pid
+ do_configure_networking
Allows it to work as expected after updating initramfs.
It might be worthwhile abstracting the networking functions into a clevis-initramfs-common library both clevis-initramfs and clevis-initramfs-zfs can use if the project maintainers would allow it.
So I've found an issue with unbinding.
If you do not use sss and bind 2 tang servers to two labels e.g.
# clevis zfs bind -d rpool -l boot01 tang '{"url": "http://192.168.1.2/"}'
# clevis zfs bind -d rpool -l boot02 tang '{"url": "http://192.168.1.3/"}'
You end up with 3 labels in the form:
# zfs get all rpool | tail -3
rpool latchset.clevis.label:boot01 [tang server 1 data] local
rpool latchset.clevis:labels boot01 boot02 local
rpool latchset.clevis.label:boot02 [tang server 2 data] local
you can then unbind them in the reverse order:
# clevis zfs unbind -l boot02 -d rpool
Loading existing key...
Enter existing ZFS password for rpool:
Wiping Clevis data... ok
unbinds boot02 removes the label and unsets the boot02 from latchset.clevis:labels
# clevis zfs unbind -l boot01 -d rpool
Loading existing key...
Enter existing ZFS password for rpool:
Wiping Clevis data... ok
unbinds boot01 removes the label and latchset.clevis:labels
however if you unbind in sequential order boot01 then boot02
# clevis zfs unbind -l boot01 -d rpool
Loading existing key...
Enter existing ZFS password for rpool:
Wiping Clevis data... ok
removes the label and latchset.clevis:labels
therefor attempting
# clevis zfs unbind -l boot02 -d rpool
ERROR: dataset is not bound with Clevis: rpool
Usage: clevis zfs unbind [-f] [-k KEY] -d DATASET [-a] -l LABEL
Unbinds a label from a ZFS dataset:
-f Force unbinding dataset
-d DATASET The ZFS dataset on which to perform unbinding
-a Unbind all labels
-l LABEL The label to unbind
-k KEY Non-interactively read ZFS password from KEY file
-k - Non-interactively read ZFS password from standard input
the latchset.clevis.label:boot02 still exists and can only be removed via
# zfs inherit latchset.clevis.label:boot02 rpool
if having both bound you attempt to unbind all:
# clevis zfs unbind -a -d rpool
removes the label for boot01 and latchset.clevis:labels leaving the latchset.clevis.label:boot02 behind as though you have just attempted to unbind boot01
Thanks for this @deatharse -- you've given lots of good observations. Let me find a weekend to address them.
I've had a bit of a further play by creating a second encrypted pool and noticed that was not unlocked.
# zpool create \
-o ashift=12 \
-o autotrim=on \
-O encryption=on -O keylocation=prompt -O keyformat=passphrase \
-O acltype=posixacl -O xattr=sa -O dnodesize=auto \
-O compression=zstd \
-O normalization=formD \
-O relatime=on \
-O canmount=off \
-O recordsize=1M storage-pool /dev/vdc
# zfs create storage-pool/backups
I notice src/initramfs-tools/scripts/zfs-load-key/clevis-zfs.in is copied to /etc/zfs/initramfs-tools-load-key.d/clevis-zfs so started playing around with what was available via my dropbear connection and making some local modifications to see what I could figure out (I have no idea how the variable ENCRYPTIONROOT is set but carried on). I have drawn inspiration from the file dracut/60clevis-zfs/clevis-zfs-hook.sh and from the main repo src/luks/dracut/clevis/clevis-luks-unlocker to come up with a diff to allow unlocking non-root pools.
The diff for src/initramfs-tools/scripts/zfs-load-key/clevis-zfs.in is currently:
22c22,41
< clevis zfs unlock -d "${ENCRYPTIONROOT}"
---
> zpool import -a
>
> for pool in $(zpool list -H -o name); do
> # if pool encryption is active and the zfs command understands '-o encryption'
> if [ "$(zpool list -H -o feature@encryption ${pool})" = 'active' ]; then
> # if the root dataset has encryption enabled
> ENCRYPTIONROOT=$(zfs get -H -o value encryptionroot "${pool}")
> if ! [ "${ENCRYPTIONROOT}" = "-" ]; then
> KEYSTATUS="$(zfs get -H -o value keystatus "${ENCRYPTIONROOT}")"
> # continue only if the key needs to be loaded
> [ "$KEYSTATUS" = "unavailable" ] || continue
> # decrypt them
> TRY_COUNT=5
> while [ $TRY_COUNT -gt 0 ]; do
> clevis zfs unlock -d "${ENCRYPTIONROOT}" && break
> TRY_COUNT=$((TRY_COUNT - 1))
> done
> fi
> fi
> done
N.B. I have yet to create an unencrypted pool with an encrypted dataset, maybe I will get round to that on a weekend.
Heres an updated diff for src/initramfs-tools/scripts/zfs-load-key/clevis-zfs.in that will handle encrypted datasets in an unencrypted pool:
22c22,58
< clevis zfs unlock -d "${ENCRYPTIONROOT}"
---
> attempt_unlock() {
> local dataset=$1
>
> KEYSTATUS=$(zfs get -H -o value keystatus "${dataset}")
> # continue only if the key needs to be loaded
> [ "$KEYSTATUS" = "unavailable" ] || break
> # decrypt them
> TRY_COUNT=5
> while [ $TRY_COUNT -gt 0 ]; do
> clevis zfs unlock -d "${dataset}" && break
> TRY_COUNT=$((TRY_COUNT - 1))
> done
> }
>
> zpool import -a
> for pool in $(zpool list -H -o name); do
> # if pool encryption is active and the zfs command understands '-o encryption'
> if [ $(zpool list -H -o feature@encryption "${pool}") = 'active' ]; then
> # if the root dataset has encryption enabled
> ENCRYPTIONROOT=$(zfs get -H -o value encryptionroot "${pool}")
> if ! [ "${ENCRYPTIONROOT}" = "-" ]; then
> attempt_unlock "${ENCRYPTIONROOT}"
> else
> # encryption in child dataset, lets get list of datasets in pool
> for dataset in $( zfs list -r -H -o name "${pool}" ); do
> # first entry will not be a child so we will ignore it
> if [ "${dataset}" != "${pool}" ]; then
> # test for encrypted dataset
> ENCRYPTIONROOT=$(zfs get -H -o value encryptionroot "${dataset}")
> if ! [ "${ENCRYPTIONROOT}" = "-" ]; then
> attempt_unlock "${ENCRYPTIONROOT}"
> fi
> fi
> done
> fi
> fi
> done
Created test datasets via:
# zpool create \
-o ashift=12 \
-o autotrim=on \
-O acltype=posixacl -O xattr=sa -O dnodesize=auto \
-O compression=zstd \
-O normalization=formD \
-O relatime=on \
-O canmount=off \
-O recordsize=1M storage-pool2 /dev/vdd
# zfs create storage-pool2/unencrypted
# zfs create -o encryption=on -o keylocation=prompt -o keyformat=passphrase storage-pool2/encrypted
# zfs create storage-pool2/encrypted/backup
# clevis zfs bind -d storage-pool2/encrypted -l boot02 tang '{"url": "http://192.168.1.3/"}'
Just curious here why the initramfs-tools integration has not added a script into /etc/zfs/initramfs-tools-load-key.d/ (mentioned here) to load the key via clevis? That would allow using clevis together with zfs-initramfs package on Debian.
The zfs-initramfs package code executed in initramfs is here.
The recently linked issue has a comment https://github.com/latchset/clevis/pull/462#issuecomment-3289354932 that reads:
I see some code duplication in the ZFS initramfs hook, it installs basically the same binaries as the regular clevis hook.
Would that observed behaviour of this branch be intentional, misunderstood or an error?
Just curious here why the initramfs-tools integration has not added a script into
/etc/zfs/initramfs-tools-load-key.d/(mentioned here) to load the key viaclevis? That would allow usingclevistogether withzfs-initramfspackage on Debian.The zfs-initramfs package code executed in initramfs is here.
Yep, that is my current workaround:
#!/bin/sh
# /etc/zfs/initramfs-tools-load-key.d/clevis
# Unlock a ZFS encryption root using Clevis-wrapped key material.
#
# How to provision the clevis properties on a dataset:
# Example (TPM+SB):
# 1. Generate a new wrapping for your dataset key:
# - For passphrase keyformat:
# echo -n 'your-passphrase' | clevis encrypt tpm2 '{"pcr_bank":"sha256","pcr_ids":"7"}' > passphrase.jwe
# zfs set clevis:passphrase="$(cat passphrase.jwe)" pool/dataset
#
# - For hex keyformat:
# echo -n 'your-hex-key' | clevis encrypt {tpm2,tang,sss} '{/*config*/}' > hex.jwe
# zfs set clevis:hex="$(cat hex.jwe)" pool/dataset
#
# - For raw keyformat:
# cat your-raw.key | clevis encrypt {tpm2,tang,sss} '{/*config*/}' > raw.jwe
# zfs set clevis:raw="$(cat raw.jwe)" pool/dataset
#
# 2. Verify:
# zfs get clevis:passphrase pool/dataset
# zfs get clevis:hex pool/dataset
# zfs get clevis:raw pool/dataset
#
# 3. Ensure the dataset's keyformat matches the property you set.
# Example: zfs get keyformat pool/dataset
#
# Environment (from zfs-initramfs):
# ENCRYPTIONROOT : encryption root dataset
# ZFS : helper to run the zfs binary
#
# Contract:
# Return 0 if we did nothing or successfully unlocked; 1 if we tried and failed.
# Helper: read a property; empty if '-' or error.
get_prop() {
val="$($ZFS get -H -o value "$1" "$ENCRYPTIONROOT" 2>/dev/null || true)"
[ "$val" = "-" ] && val=""
printf '%s' "$val"
}
# passphrase | hex | raw | none
KEYFORMAT="$(get_prop keyformat)"
# Build clevis property name directly from keyformat
CLEVIS_PROP="clevis:$KEYFORMAT"
# Fetch the JWE from that property; log whether found or not.
PROP="$(get_prop "$CLEVIS_PROP")"
if [ -n "$PROP" ]; then
log_success_msg "ZFS: found $CLEVIS_PROP on $ENCRYPTIONROOT"
else
log_warning_msg "ZFS: no $CLEVIS_PROP found on $ENCRYPTIONROOT"
return 0
fi
# Optional: log the Clevis pin type (e.g., tpm2/tang/sss) if jose is present
if command -v jose >/dev/null 2>&1; then
PIN="$(printf %s "$PROP" \
| jose jwe fmt -i- \
| jose fmt -j- -Og protected -yOg clevis -Og pin -Su- 2>/dev/null || true)"
if [ -n "$PIN" ]; then
log_success_msg "ZFS: clevis pin=$PIN keyformat=$KEYFORMAT for $ENCRYPTIONROOT"
fi
fi
# Need clevis to decrypt
if ! command -v clevis >/dev/null 2>&1; then
log_warning_msg "ZFS: clevis not available in initramfs; skipping"
return 0
fi
# Unlock using clevis → zfs (zfs interprets bytes according to keyformat)
log_begin_msg "ZFS: unlocking $ENCRYPTIONROOT with clevis ($CLEVIS_PROP)"
printf %s "$PROP" \
| clevis decrypt \
| $ZFS load-key -L prompt "$ENCRYPTIONROOT"
ret=$?
log_end_msg $ret
return $ret