YubiKey-Guide icon indicating copy to clipboard operation
YubiKey-Guide copied to clipboard

evaluate materials backup strategy

Open koddo opened this issue 1 year ago • 23 comments

Any particular reason not to use tar+gpg for backups? LUKS seems error-prone compared to a single command of encrypting a folder.

Thank you for the guide, by the way.

koddo avatar Jun 14 '24 12:06 koddo

Indeed, one could symmetrically encrypt the private key and backup contents with GPG (though I would still recommend using a different passphrase from identity). The resulting archive would be more portable and convenient to use.

However, portability and convenience of accessing backups is not a goal of this guide and, in a way, LUKS limits portability to encourage use of a secure-ish operating environment. I also suppose the argument could be made that using different encryption programs to secure key material is "better", but in practice this is likely not relevant. In other words, in terms of at-rest encryption protection, LUKS and GPG are likely equivalent.

I'm curious what you mean by LUKS being error-prone. If it truly is a hassle for readers, we could simplify the instructions and move LUKS to Optional Hardening. What do you think? Is there anything else to consider?

drduh avatar Jun 30 '24 20:06 drduh

Maybe it's just me, but I never interact with LUKS directly, which means I don't know what all those commands mean, and there's a lot to learn already about gpg and yubikey-manager. Also, by error-prone I mean manipulation of partitions in /dev manually: commands like sudo dd if=/dev/zero of=/dev/sdc bs=4M count=1 seem dangerous for sleep-deprived users, I didn't mess up purely by luck. Everything else is recoverable.

koddo avatar Jul 09 '24 19:07 koddo

Another option may be to use LUKS with a loop device, the latter of which is a file used as a block device, not to be confused with loop-AES, the latter of which seems to be an alternative to LUKs. Using a loop device has similar advantages to the current LUKs steps but has two, IMHO, significant advantages:

  1. Can be safely automated, addressing koddo's concerns. All destructive operations are restricted to a file, not an entire partition or disk. This is the approach I plan to try with the automation scripts I'm currently implementing for my use case.
  2. It's portable. Being able to back up data remotely may be recommended, depending on a user's specific security threat model, as well as their physical environment, such as living in a natural disaster prone region.

Disadvantages:

  1. If a user chooses not to use an ephemeral environment and they forget to delete the encrypted file after copying it to portable devices and/or remote storage, the file may be more vulnerable to exfiltration than the current physical disk based approach but this threat seems minor compared to the advantages and the file is encrypted.
  2. Nothing else obvious or significant that I can think of. However, I have minimal experience with LUKS and loop devices so it would be great if someone more experienced could comment on this approach.

I think this approach can also be used with the OpenBSD equivalent, virtual node devices, but I have not tried it.

Linux Steps

The steps have been tested on Debian Live 12.8.0:

009f482430b8505bba8099aeac3f59eab92ddfa30ae8910d6293f7e198194bad24fa625fcfa38813e6e25b2b7502a542fa04af4e4c2a52376c36162fea8debf7  debian-live-12.8.0-amd64-gnome.iso

user@debian:~$ cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
NAME="Debian GNU/Linux"
VERSION_ID="12"
VERSION="12 (bookworm)"
VERSION_CODENAME=bookworm
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
IMAGE_ID=live
BUILD_ID=20241109T101058Z

The loop device specific steps are based on the cryptsetup FAQ:

zcat /usr/share/doc/cryptsetup/FAQ.md.gz|grep -A 10 '2.6 How do I use LUKS with a loop-device?'
  * **2.6 How do I use LUKS with a loop-device?**

  This can be very handy for experiments.  Setup is just the same as with
  any block device.  If you want, for example, to use a 100MiB file as
  LUKS container, do something like this:
```
    head -c 100M /dev/zero > luksfile               # create empty file
    losetup /dev/loop0 luksfile                     # map file to /dev/loop0
    cryptsetup luksFormat --type luks2 /dev/loop0   # create LUKS2 container
```
  Afterwards just use /dev/loop0 as a you would use a LUKS partition.

I have marked the new/modified steps with (changed). All the other steps are existing from the guide.

(changed) Choose a name for the encrypted loop device file:

export LUKS_GPG_BACKUP_FILE="gpg_backup_luks"

(changed) Create a small (at least 20 Mb is recommended to account for the LUKS header size) loop device file for storing secret materials:

head -c 20M /dev/zero > "$LUKS_GPG_BACKUP_FILE"

(changed) Associate $LUKS_GPG_BACKUP_FILE with the next available loop device, automatically ensuring the file isn't mapped to multiple loop devices:

sudo losetup -f -L "$LUKS_GPG_BACKUP_FILE"

(changed) Ensure losetup is on non-sudo user PATH, which is needed on some OS's such as Debian Live 12.8.0:

export PATH="/usr/sbin:$PATH"

(changed) Determine what loop device was used:

export LUKS_GPG_BACKUP_LOOP_DEVICE=$(losetup -j "$LUKS_GPG_BACKUP_FILE" |cut -d: -f1)

Generate another unique Passphrase (ideally different from the one used for the Certify key) to protect the encrypted volume:

export LUKS_PASS=$(LC_ALL=C tr -dc 'A-Z1-9' < /dev/urandom | \
  tr -d "1IOS5U" | fold -w 30 | sed "-es/./ /"{1..26..5} | \
  cut -c2- | tr " " "-" | head -1) ; printf "\n$LUKS_PASS\n\n"

This passphrase will also be used infrequently to access the Certify key and should be very strong.

Write the passphrase down or memorize it.

(changed) Format the loop device:

echo $LUKS_PASS | sudo cryptsetup -q luksFormat "$LUKS_GPG_BACKUP_LOOP_DEVICE"

(changed) Mount the loop device

echo $LUKS_PASS | sudo cryptsetup -q luksOpen "$LUKS_GPG_BACKUP_LOOP_DEVICE" gnupg-secrets

Create an ext2 filesystem:

sudo mkfs.ext2 /dev/mapper/gnupg-secrets -L gnupg-$(date +%F)

Mount the filesystem and copy the temporary GnuPG working directory with key materials:

sudo mkdir /mnt/encrypted-storage

sudo mount /dev/mapper/gnupg-secrets /mnt/encrypted-storage

sudo cp -av $GNUPGHOME /mnt/encrypted-storage/

(changed) An example to better visualize the relationship between the underlying file gpg_backup_luks, the loop device and the LUKS container:

user@debian:~$ losetup -l -j gpg_backup_luks
NAME       SIZELIMIT OFFSET AUTOCLEAR RO BACK-FILE                  DIO LOG-SEC
/dev/loop1         0      0         0  0 /home/user/gpg_backup_luks   0     512

user@debian:~$ lsblk -p /dev/loop1
NAME                        MAJ:MIN RM SIZE RO TYPE  MOUNTPOINTS
/dev/loop1                    7:1    0  20M  0 loop
└─/dev/mapper/gnupg-secrets 253:0    0   4M  0 crypt /mnt/encrypted-storage

(changed) Unmount and close the encrypted volume and detach the loop device:

sudo umount /mnt/encrypted-storage

sudo cryptsetup luksClose luksfile

sudo losetup -d "$LUKS_GPG_BACKUP_LOOP_DEVICE"

(changed) Copy $LUKS_GPG_BACKUP_FILE to multiple portable storage devices and potentially remote storage. (changed) If you are not using an ephemeral OS, delete the local copy.

Signing

I'd also suggest with this approach to also sign the backing file. This would be especially useful if backing up the file to remote storage.

forbytten avatar Nov 21 '24 08:11 forbytten

@forbytten your approach is novel and perhaps technically superior, but I understand the original request as wanting to reduce overall complexity during setup.

can we do this without also materially impacting the security and durability of materials?

drduh avatar Apr 04 '25 01:04 drduh

You're right that I focused only on dd being dangerous, due to confirmation bias, as dd was my main concern. I'll leave it up to @koddo to comment on whether they view complexity as having the same "badness" weight as dd being dangerous and error prone but in general, I would say that simplicity is best.

The available backup options appear to fall within the falling categories:

  1. tar + encrypt, such as tar + GPG. Sub-categories:

    • public key encryption
    • symmetric encryption
  2. filesystem level encryption. Sub-categories:

    • native, supported by the filesystem itself, such as ext4 + fscrypt
    • stacked, layered on top of the real filesystem. Probably little reason to consider these if a native solution exists.
  3. block device level encryption, such as LUKS2. Sub-categories:

    • physical block device (existing YubiKey Guide approach)
    • virtual block device (the loop device approach I suggested)

To elaborate on the ext4 + fscrypt option, I think it is simple to use and I believe provides similar security benefits as the existing approach but does not require the user to create a new partition. However, it does assume they already have an ext4 filesystem to create the encrypted directory on. Some links to the basic usage:

  1. Ensure encryption is enabled on the ext4 filesystem

  2. Set up fscrypt on empty directory

  3. Lock and unlock the directory when you want to close/open it

As with many things in security, I don't think there is a single "best" option so I've had a go at a pro/con matrix below, ignoring stacked filesystems. I suggest the "right" choice depends on which criteria you view as taking priority for the guide. However, whatever choice is made - be it the current approach or something new - perhaps something like this matrix could be included in the guide to allow users to explore other options if they aren't comfortable with the recommendation, similar to how Prepare environment covers various options?

Criteria tar + GPG public key encryption tar + GPG symmetric encryption ext4 + fscrypt LUKS2 + physical device LUKS2 + loop device
Usage complexity (Low/Medium/High) Low Low Medium Medium High[^1]
Resistance to offline attacks (Low/Medium/High) Medium[^2] High[^3] High[^4] High[^5] High[^6]
Resistance to image exfiltration (Low/Medium/High) Medium[^7] Medium[^7] High[^8] High Medium[^7]
Resistance to plaintext persistence (Low/Medium/High) Low[^9] Low[^9] High[^10] High[^10] High[^10]
Resistance to file corruption[^11] Dependent on hosting filesystem Dependent on hosting filesystem Ext4 journalling Dependent on encrypted filesystem Dependent on both hosting filesystem and encrypted filesystem
Multiple backups Copy encrypted file locally or remotely[^12] Copy encrypted file locally or remotely[^12] Mount, copy files to independently encrypted target[^13] Mount, copy files to independently encrypted target Copy encrypted file locally or remotely[^12]
Authenticated encryption Via GPG -sign option Via GPG -sign option No[^14] No[^15] Not directly but you can sign the loop device file using GPG. However, having a separate step is prone to inconsistency if you forget to re-sign after changing the data.
YubiKey integration Encrypt, decrypt, sign Sign No Encrypt/decrypt master key via systemd-cryptenroll[^16] Encrypt/decrypt master key decrypt via systemd-cryptenroll[^16]
Cross platform access tar + gpg available on most platforms tar + gpg available on most platforms TBD[^17] TBD[^17] TBD[^17]
Safe for automation Yes Yes Mostly - some filesystem wide changes are required, such as enabling encryption using tune2fs No - danger of destroying drive partitions Yes

[^1]: Requires additional knowledge of loop device management

[^2]: GPG public key encryption uses a hybrid cipher (aka. envelope encryption). The data is encrypted using symmetric encryption, then the symmetric key is encrypted using public key encryption. Of the two encryption types, public key encryption is probably the weaker of the pair.

[^3]: Assuming GPG has been hardened as per the YubiKey Guide.

[^4]: fscrypt does not encrypt "non-filename metadata, such as timestamps, the sizes and number of files, and extended attributes" but for this use case, I'd consider the metadata leakage to be insignificant, given the limited type of data being encrypted. fscrypt defaults to (AES-256-XTS, AES-256-CBC-CTS) for (contents, filename) encryption, the former of which matches LUKS2. AES-256-XTS uses a 512 bit long master key, with only half of the key being used for the underlying AES-256 cipher, the same as the defaults used by cryptsetup.

[^5]: On Debian Live 12.10.0, cryptsetup 2.6.1 defaults to aes-xts-plain64 with a 512 bit long master key, with only half of the key being used for the underlying AES-256 cipher, the same default used by fscrypt:

```
user@debian:~$ sudo cryptsetup --version
cryptsetup 2.6.1 flags: UDEV BLKID KEYRING KERNEL_CAPI
user@debian:~$ sudo cryptsetup --help |tail -5
Default compiled-in device cipher parameters:
        loop-AES: aes, Key 256 bits
        plain: aes-cbc-essiv:sha256, Key: 256 bits, Password hashing: ripemd160
        LUKS: aes-xts-plain64, Key: 256 bits, LUKS header hashing: sha256, RNG: /dev/urandom
        LUKS: Default keysize with XTS mode (two internal keys) will be doubled.
```

[^6]: I asked on the crypsetup mailing list about the security of LUKS2 on a loop device and Milan Broz, co-contributor of the cryptsetup FAQ (https://gitlab.com/cryptsetup/cryptsetup/-/wikis/FrequentlyAskedQuestions#a-contributors) confirmed it's equivalent to a physical device in terms of the encryption robustness: https://lore.kernel.org/cryptsetup/[email protected]/

[^7]: Tar files and a loop device file are only considered easier to exfiltrate in the sense they more easily afford the end user accidentally leaving the files lying around somewhere they shouldn't be and facilitating remote access, as opposed to a physical USB drive. However, human error could also result in the USB drive being mounted long term so it's maybe only a minor distinction.

[^8]: ext4 + fscrypt directories cannot be copied in their encrypted form, only when mounted, and are thus as resistant to image exfiltration as LUKS2 + physical device. If you attempt to copy the locked dir or anything in it, an error will occur:

    ```
    cp: cannot open 'encrypted-dir/mKsr5Y8OVRkIHKTsT3L25O2_K-pf7hntVBr3yUeuwOGyUZGZ5S11ig' for reading: Required key not available
    ```

[^9]: GPG decryption will persist plaintext to disk, and is thus vulnerable to plaintext persistence if the underlying disk is physical and not encrypted. Furthermore, even if the underlying disk is encrypted, the encryption strength is independent of the GPG encryption and could be weaker. Accidental plaintext persistence in this context is not totally disastrous, though, as the private keys are still encrypted by gpg-agent. However, gpg-agent only uses AES-128, which is expected to be vulnerable to Quantum Computing, if that ever eventuates. Furthermore, the GPG revocation key is plaintext.

[^10]: fscrypt and LUKS2 transparently decrypt data as it is read so the plaintext is never persisted to disk.

[^11]: None of the options dictate the hosting or encrypted filesystem, except for the ext4 + fscrypt option which is specifically using ext4's fscrypt support. However, in all cases, a modern journaled filesystem such as ext4 is recommended.

[^12]: Depending on the context, the ability to easily backup the encrypted data to a remote location may be a positive or negative. For example, remote backups may prove useful in locations with questionable physical security or in disaster prone areas, in which case the benefit may outweigh the negative of potential exfiltration.

[^13]: fscrypt - Backup, restore, and recovery

[^14]: fscrypt supported modes:

> Authenticated encryption modes are not currently supported because of the difficulty of dealing with ciphertext expansion

[^15]: LUKS2 has experimental support for authenticated encryption, listed in the cryptsetup man page, but EXPERIMENTAL and backup do not seem like a good match:

> Authenticated disk encryption (EXPERIMENTAL)
>   Since Linux kernel version 4.12 dm-crypt supports authenticated disk encryption.

[^16]: In my testing, systemd-cryptenroll uses PBKDF as the key derivation function when enrolling a FIDO2 device with LUKS2, whereas LUKS2 uses argon2id for passphrases. If the entropy of the FIDO2 generated credential is truly high, this shouldn't matter but I would probably feel safer if argon2id is used as a defence-in-depth measure against FIDO2 implementation weaknesses. If in doubt, I'd recommend choosing a high entropy user chosen passphrase instead and forgo the ease of use of YubiKey integration.

[^17]: I don't have convenient access to a Mac to test on and I only have limited experience with BSDs and WSL

forbytten avatar May 01 '25 09:05 forbytten

@forbytten, this is a great summary, thank you.

A couple of questions:

  1. When a user encrypts and decrypts backups while running a linux live image, the risk of plaintext persistence is mitigated, as everything is on in-memory filesystem, right?

  2. What is image exfiltration? And why the risks are higher in the case of tar+gpg? I couldn't find anything on this myself.

koddo avatar May 01 '25 11:05 koddo

  1. Yes. There's nothing inherently wrong with the tar + GPG approach, just that it relies on the user following correct procedure rather than the underlying tech protecting the user from themselves.

  2. For disks, I just mean the ability of an attacker to dump the encrypted blocks of the disk (the image) and transfer them to an attacker controlled destination (exfiltrate) as a precursor to an offline attack. For tar + GPG, I'm loosely generalizing the term for simplicity, treating the encrypted tar file as analogous to the encrypted disk image, even though they are technically very different.

    The data can be considered as being cold (rarely accessed) and thus its natural, long term state for each option is:

    tar + GPG public key encryption tar + GPG symmetric encryption ext4 + fscrypt LUKS2 + physical device LUKS2 + loop device
    cold state encrypted file encrypted file encrypted directory encrypted partition encrypted file

    In terms of "Resistance to image exfiltration", I just mean that an encrypted file is more easily copied from source to target, whereas for the other two options, to copy the encrypted state requires:

    1. ext4 + fscrypt: as noted in footnote 8, it can't be done by directory or file copy. So you have to dump the raw bytes of the drive, say using dd.
    2. LUKS2 + physical device: similarly, you have to dump the raw bytes of the drive.

    So say we assume the following insecure configuration due to human error: for each option, the encrypted data is stored on a thumb drive and the user accessed it on their daily system instead of an ephemeral system and the user has forgotten to remove it so it's attached long/longish term (but not decrypted). Further assume that for the encrypted file options, the thumb drive itself is not encrypted and is mounted. For these options, a remote attacker only needs file traversal capabilities in order to obtain the encrypted file, whereas for the other two options, the attacker needs RCE (remote code execution) capability, plus typically root permissions I believe, in order to dump the raw bytes of the drive, which are much more stringent requirements.

    Similar to your first question, as long as your operational security practices are fine, IMHO there's no significant issue with the encrypted file options - I am personally partial to LUKS2 + loop device. It's just a criteria I thought worth mentioning in the table. I should have also mentioned the criteria are not sorted in terms of relative importance, as that's mostly context dependent.

forbytten avatar May 02 '25 11:05 forbytten

@forbytten, thank you for the explanation and analysis.

koddo avatar May 02 '25 14:05 koddo

I would suggest another criterion to be KDF (key derivation function) support. The KDF is a function that transforms/derives the real symmetric encryption key from the passphrase. It's utility is two fold in this context:

  1. Produce a key that has higher entropy than a source which potentially has lower entropy.
  2. Be a password hashing function so that it's expensive to compute, in order to make brute forcing the password computationally infeasible. OWASP has a decent primer on password hashing.

Comparing the backup options:

  • GPG public key encryption: KDF isn't applicable, exchanging the whole concept for the security of the public key encryption, as the public key is used to encrypt the symmetric key, the latter of which I believe GPG will securely generate with high entropy.

  • GPG symmetric encryption

    • The latest OpenPGP spec supports Argon2id. Proton's blog back in Sep, 2023 also mentions that Argon2id is preferable.
    • However, GPG has not added support for Argon2id based on its NEWS page and therefore only supports the old and broken "Iterated and Salted S2K".
  • fscrypt has two components:

    • kernel level. Does not use a KDF, instead delegating the responsibility to the userspace tool.
    • userspace tool. Uses Argon2id, although it's unclear if the default computation time of 1 second can be overridden.
  • LUKS2 defaults to Argon2id these days. Computation time can be increased via the cryptsetup luksFormat --iter-time option. For example, if you specify 5000 milliseconds, cryptsetup will choose parameters that result in one iteration taking about 5 seconds to compute on your hardware.

Now, the KDF only adds utility if your passphrase is weak or relatively weak. So if you decide on using GPG symmetric encryption to encrypt your backup, as @koddo seems to be, I would say it's even more important to choose a high entropy passphrase. 256 bits of entropy would be ideal, IMHO, matching the AES-256 key size and completely removing the KDF from the threat model.

forbytten avatar May 25 '25 16:05 forbytten

@forbytten, hm. Isn't recommended password entropy around 100 bits or slightly above? To me 256 bits is overkill. This passphrase would be really hard to remember.

It would be easier to replace rsa key with ed25519 and remember it directly, then no backup is needed.

koddo avatar May 26 '25 12:05 koddo

@forbytten, if it's advisable to use gpg symmetric encryption with argon2id, a normal good passphrase could be just hashed using the argon2 command line utility before feeding it to gpg. If I understand this correctly.

Thanks for the suggestion though, TIL.

koddo avatar May 26 '25 13:05 koddo

Actually, hashing your passphrase yourself and feeding it to GPG may be a good idea. Just be careful how you do it. For example, the GPG man page states for --passphrase-fd n that "Only the first line will be read from file descriptor n" and GPG doesn't appear to have an option to provide a raw key. So if you were to accidentally pipe raw bytes of an Argon2id hash into GPG and, say, the first byte just happens to be 0x0a, you'll have used a terminally bad passphrase. Actually, that's not strictly true because GPG does some basic checking of the passphrase but the general idea of weakening the hash still holds. If you use the hex encoded hash, I think you should be good to go. However, I feel obliged here to suggest that any time you roll your own crypto, and I'd include "glueing existing boxes together" in "roll your own", you maybe shouldn't. It's easy to get burned unless you really know what you're doing. But if you're up for the challenge, it can also be interesting :)

You raise a couple of other points I'd also like to expand upon.

Firstly, regarding entropy, N bits of apparent entropy in one context isn't necessarily the same as N bits of real entropy in another context. Let's look at symmetric encryption generically first. With today's computing power and projections in the short to medium term, 128 bits of entropy is generally considered sufficient but less than that is considered weak. Example references:

  1. https://cheatsheetseries.owasp.org/cheatsheets/Cryptographic_Storage_Cheat_Sheet.html#algorithms : "For symmetric encryption AES with a key that's at least 128 bits (ideally 256 bits)"

  2. https://csrc.nist.gov/pubs/sp/800/57/pt1/r5/final page 59, Table 4

Image

That 128 bits of entropy refers to the raw encryption key itself and has nothing to do with the user's passphrase. As you pointed out, when a user chooses a passphrase, the passphrase itself typically has less than 128 bits of entropy. So if you use that passphrase directly as the encryption key, your encryption key no longer has 128 bits of entropy but has the passphrase's entropy and AES-128 effectively becomes AES-N, where N < 128 and often significantly less than 128 and hence the encryption is no longer strong.

This is where KDF's come into play. They stretch the passhrase into the full AES-128 or AES-256 key space but furthermore, if the KDF is cheap to compute, an attacker can just brute force across a typically much smaller passphrase search space and hence you still end up with AES-N, where N < 128. So a KDF that is also a good password hashing function like Argon2id should be used to make the search for the correct passphrase computationally impractical.

With GPG's KDF of "Iterated and Salted S2K", you could do some experiments to determine exactly how fast it is to compute but the OpenPGP spec tells us it's a simple algorithm and AFAIK, the hash function can only be one of the one's listed by gpg --version, which are all fast to compute, as they are not password hashing functions. So effectively, we can view GPG's KDF as approximately the same as having no KDF at all. So if you choose a password/passphrase that has around 100 bits of entropy, you effectively end up with AES-100, falling within NIST's "Disallowed" protection category.

Okay, so if you choose a passphrase with 128 bits of entropy, are you good to go now? Enter the prospect of Quantum Computing. Grover's algorithm is expected to be able to effectively halve the bit strength so AES-128 reduces to AES-64. So for future proofing, AES-256 is preferred, which is expected to be Quantum Computing resistant. But if you use AES-256 with GPG but only a 128 bit entropy password, you effectively end up with AES-128 again. That's why I stated ideally you'd use a 256 bit entropy passphrase.

This brings us to your second point of a 256 bit entropy passphrase being hard to remember. I would suggest that attempting to memorize a passphrase for a backup that is expected to be rarely used, isn't advisable. IMHO, memorized passphrases are best reserved for frequently/relatively frequently used operations. In this situation, I'd recommend a password manager that uses a decent KDF, memorize the passphrase for that and use it to generate a 256 bit entropy password for the GPG encrypted backup.

forbytten avatar May 26 '25 16:05 forbytten

An example of incorrectly feeding argon2 to gpg is below. I'm not going to try for the correct way because I'm not sure I endorse it or maybe it's because I'm too cowardly :)

Encrypt:

$ echo hello > temp.txt
$ echo dummypassphrase | argon2 SALTSALT -id | gpg -c --batch --passphrase-fd 0  temp.txt

Can be decrypted using any passphrase!

$ echo wrongpassphrase | argon2 SALTSALT -id |gpg --batch --passphrase-fd 0 --decrypt temp.txt.gpg
gpg: AES256.CFB encrypted data
gpg: encrypted with 1 passphrase
hello

The issue is that argon2 by default outputs lines of text, with the first line being a known constant:

$ echo dummypassphrase | argon2 SALTSALT -id
Type:           Argon2id
Iterations:     3
Memory:         4096 KiB
Parallelism:    1
Hash:           dc8d28fb3660141f36fb59dc5ef9f43101a5b7f27cfec3fba7058a193133aaf3
Encoded:        $argon2id$v=19$m=4096,t=3,p=1$U0FMVFNBTFQ$3I0o+zZgFB82+1ncXvn0MQGlt/J8/sP7pwWKGTEzqvM

forbytten avatar May 26 '25 17:05 forbytten

any time you roll your own crypto, and I'd include "glueing existing boxes together" in "roll your own", you maybe shouldn't

@forbytten, that's a good point.

Thanks for elaborate answers, I learned a lot today.

if you were to accidentally pipe raw bytes of an Argon2id hash into GPG and, say, the first byte just happens to be 0x0a, you'll have used a terminally bad passphrase

One thing: there's a -r option for argon2, which is for "output only the raw bytes of the hash", but it seems to output hex-encoded hash, not binary data:

$ echo dummypassphrase | argon2 SALTSALT -id
Type:		Argon2id
Iterations:	3
Memory:		4096 KiB
Parallelism:	1
Hash:		dc8d28fb3660141f36fb59dc5ef9f43101a5b7f27cfec3fba7058a193133aaf3
Encoded:	$argon2id$v=19$m=4096,t=3,p=1$U0FMVFNBTFQ$3I0o+zZgFB82+1ncXvn0MQGlt/J8/sP7pwWKGTEzqvM
0.013 seconds
Verification ok

$ 
$ 
$ echo dummypassphrase | argon2 SALTSALT -id -r
dc8d28fb3660141f36fb59dc5ef9f43101a5b7f27cfec3fba7058a193133aaf3

Does this mitigate the problem? Could be pipelined like echo dummypassphrase | argon2 SALTSALT -id -r | tee /dev/tty | gpg ... to double check the output, although this is not necessary.

And all the parameters should be specified as well and saved along with salt, this is just an example.

koddo avatar May 26 '25 18:05 koddo

@koddo yes, if I were to use that approach, that's how I'd do it. However, I would still recommend/prefer the password manager approach, especially since now you don't just have a passphrase to remember, you also have the salt, iteration count, memory usage and parallelism approaches to remember. Applications that use Argon2 typically persist those along with the ciphertext. For example, LUKS2 saves them in the LUKS header, which you can view with sudo cryptsetup luksDump DEVICE

forbytten avatar May 27 '25 02:05 forbytten

@koddo not sure what I was thinking. You don't have to memorize the salt, iteration count, memory usage and parallelism. You can just stash them in a text file alongside the backup. They aren't secret values, albeit if they were secret, an attacker will have a harder time.

forbytten avatar May 27 '25 03:05 forbytten

@forbytten, yeah, for the sake of discussion: the downside of this is an added step of generating salt and saving all these parameters for argon2, but this could as well be automated though. And this data is another file besides the gpg file itself. Could be saved in filename, but this doesn't seem reliable enough.

koddo avatar May 27 '25 05:05 koddo

@koddo I thought I'd throw in another idea as food for thought as an alternative to the "saved in filename" concept you mentioned. Linux filesystems like ext4 support extended attributes, which are used, for example, by SELinux. However, something I've never used before is the feature that allows users to add arbitrary metadata to a file. I'm not sure if this is a good idea, as the presence of the resulting metadata isn't as obvious as having a metadata file sitting alongside the backup file but examples of how it works are below. Another limitation is if you want to copy the backup to a destination that will not preserve the metadata, such as cloud storage but if you only intend to use portable storage devices, that's not a hurdle. If you use Windows, I think NTFS also supports a similar concept of alternate data streams.

  1. Install attr package:

    sudo apt install attr
    
  2. Set some attributes in the user namespace:

    $ echo hello > file.txt
    $ setfattr -n user.argon2.salt -v SALTSALT file.txt
    $ setfattr -n user.argon2.iterations -v 20 file.txt
    
  3. Get attributes

    $ getfattr file.txt
    # file: file.txt
    user.argon2.iterations
    user.argon2.salt
    
    $ getfattr -n user.argon2.salt file.txt
    # file: file.txt
    user.argon2.salt="SALTSALT"
    

But you have to be careful when copying the file as the attributes are not copied by default. I think this alone is probably the strongest argument against using this approach but if you front your operations with scripts, possibly it won't be a significant hurdle:

```
$ cp file.txt filecopy.txt
$ getfattr filecopy.txt
```

Copying the file in archive mode will preserve all attributes:

$ cp -a file.txt filecopy.txt
$ getfattr filecopy.txt
# file: filecopy.txt
user.argon2.iterations
user.argon2.salt

You can also explicitly backup the attributes to a file but I don't think this is useful for this use case. You may as well have just had them in a file to begin with.

$ getfattr --dump file.txt > file_attributes.txt
$ cat file_attributes.txt
# file: file.txt
user.argon2.iterations="20"
user.argon2.salt="SALTSALT"

forbytten avatar May 28 '25 08:05 forbytten

@forbytten, that's unfortunate metadata is so fragile. I wouldn't use it for anything that could ever leave ext4 filesystem.

Another option would be putting this argon2-parameters file alongside with gpg-encrypted data in a tar archive, this way it's not going to be lost. This is much more user-friendly, if we are designing a fool-proof way to make backups.

koddo avatar May 29 '25 12:05 koddo

Thank you @forbytten for all the extra details you provide here.

My target use-case is:

  1. file-based backup for convenience;
  2. as you defined earlier: being able to back up data remotely

I have been using tar+gpg-symmetric, which I now realize thanks to your comment in https://github.com/drduh/YubiKey-Guide/issues/477#issuecomment-2910993754 - is problematic unless both are true: encryption passphrase has enough entropy and gpg-config is hardened (with default config even a high-entropy passphrase will be KDF'd into a 128-bit key).

With this, I think will be switching to LUKS2+loop. Not only defaults are good, but the added benefit of partition being unmounted after reboot reduces the risk of operator-error leaving unencrypted material anywhere.

dimitry12 avatar May 29 '25 16:05 dimitry12

high-entropy passphrase will be KDF'd into a 128-bit key

@dimitry12, as far as I understand, this only applies to asymmetric encryption, because gpg-agent protects private keys with only AES-128 for some reason. And the good news this doesn't concern us, because private keys never leave yubikeys.

And when we use gpg for symmetric encryption of backups, it does use the cypher specified by user. Which is AES-256, thanks to the hardened gpg.conf from this guide.

$ gpg -c test.txt
$ gpg -vv test.txt.asc
...
:symkey enc packet: version 4, cipher 9, aead 0,s2k 3, hash 10
...

Cipher 9 here means AES-256, and hash 10 is SHA-512, see rfc4880 section 9.2 and section 9.4.

Good news again, S2K doesn't reduce entropy, it preserves entropy of passphrase. Unfortunately though, it's bad for passphrase stretching. This is why argon2id is recommended instead. Or in addition, as discussed above.

added benefit of partition being unmounted after reboot

LUKS+loop is a good choice though, especially when keys are generated in non-ephemeral environments.

koddo avatar May 30 '25 09:05 koddo

Another option would be putting this argon2-parameters file alongside with gpg-encrypted data in a tar archive, this way it's not going to be lost. This is much more user-friendly, if we are designing a fool-proof way to make backups.

@koddo It also better corresponds to what gpg would do if it actually supported Argon2id: prefix the binary with the KDF parameters. So it does seem much neater this way.

forbytten avatar May 30 '25 11:05 forbytten

With this, I think will be switching to LUKS2+loop. Not only defaults are good, but the added benefit of partition being unmounted after reboot reduces the risk of operator-error leaving unencrypted material anywhere.

@dimitry12 The auto unmount is a property I like too. Saves me from myself. If using tar+gpg-symmetric, though, if you can develop a habit of only decrypting into /dev/shm, at least on Linux, you can achieve a similar outcome, regardless of ephemeral/non-ephemeral environment, since /dev/shm is tmpfs on most, if not all, Linux distros. Not sure about non-Linux systems.

$ df -h /dev/shm
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           3.9G     0  3.9G   0% /dev/shm

forbytten avatar May 30 '25 11:05 forbytten