RSA modpow implementation is not constant-time

Hi there,

I'm the author of sidefuzz (https://github.com/phayes/sidefuzz) and I have found what appears to be variable-time behavior in the rsa::internals::encrypt() function. Specifically, rsa::internals::encrypt() appears to be variable-time in relation to the message. Note that I haven't worked this up into an actual exploit, but merely demonstrated that this function isn't constant-time in relation to the message inputed.

Specifically, the message 20d90c8af42aac9b1ee53dc9a0187201 takes 549894 instructions to encrypt, while the message 5a28cec68d47f6fe3b1df54c9f320f6d takes 552427 instruction to encrypt. This is a difference of 2533 instructions, or about 0.5%. So it's a slight difference, but probably exploitable with sufficient sampling.

I have crated a fuzzing targets here: https://github.com/phayes/sidefuzz-targets

You can confirm this difference with the sidefuzz tool like so:

sidefuzz check ./target/wasm32-unknown-unknown/release/rsa_encrypt_message.wasm 5a28cec68d47f6fe3b1df54c9f320f6d 20d90c8af42aac9b1ee53dc9a0187201
samples: 20000, t-value: 219771.0790572351, confidence: 100%
Found timing difference of 2533 instructions between these two inputs with 100% confidence:
input 1: 5a28cec68d47f6fe3b1df54c9f320f6d (552427 instructions)
input 2: 20d90c8af42aac9b1ee53dc9a0187201 (549894 instructions)

My first suspicion was that this was due to num_bigint_dig::BigUint::from_bytes_be() being variable-time, but fuzzing that function specifically results in what appears to be constant-time behavior. So I'm not actually sure where the problem is.

Apr 30 '19 19:04 phayes

IIRC RSA blinding (which I believe is used in this implementation) uses random number, so while execution time is still variable, it does not correlate with a protected secret.

Apr 30 '19 20:04 newpavlov

Hi @newpavlov ,

I don't think that's the case here. The fuzzing target is directly testing this function:

/// Raw RSA encryption of m with the public key. No padding is performed.
#[inline]
pub fn encrypt<K: PublicKey>(key: &K, m: &BigUint) -> BigUint {
    m.modpow(key.e(), key.n())

So the problem is going to be in the implementation of modpow.

I've also got another fuzzing target that does full pkcs1v15 padding with a statically seeded CPRNG. It displays the same variable-time behavior (with a statically seeded deterministic PRNG mind-you), but the first fuzzing target was the more minimal case, so that's what I reported. (sidefuzz also accounts for RNGs by repeated sampling and taking a t-value, but this doesn't apply here)

Admittedly, this could also be an artifact of the fuzzer (which would be a bug in the fuzzer), but I don't think that's the case here either.

Apr 30 '19 20:04 phayes

thank you for the report. I have not hardned the underlying modpow impl for const timing yet, so I am not surprised. I will look into it as soon as I find the time.

On 30. Apr 2019, 22:55 +0200, Patrick D Hayes [email protected], wrote:

Hi @newpavlov , I don't think that's the case here. The fuzzing target is directly testing this function: /// Raw RSA encryption of m with the public key. No padding is performed. #[inline] pub fn encrypt<K: PublicKey>(key: &K, m: &BigUint) -> BigUint { m.modpow(key.e(), key.n()) So the problem is going to be in the implementation of modpow, which only takes secret data as input. I've also got another fuzzing target that does full pkcs1v15 padding with a statically seeded CPRNG. It displays the same variable-time behavior (with a statically seeded PRNG mind-you), but the first fuzzing target was the more minimal case, so that's what I reported. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

Apr 30 '19 21:04 dignifiedquire

https://github.com/rust-num/num-bigint/blob/7562ab24330792817e42b808f60b0cac51ca261a/src/monty.rs#L69-L73
https://github.com/rust-num/num-bigint/blob/7562ab24330792817e42b808f60b0cac51ca261a/src/monty.rs#L121
https://github.com/rust-num/num-bigint/blob/7562ab24330792817e42b808f60b0cac51ca261a/src/monty.rs#L207-L219

may be causing this.

Jun 25 '20 00:06 jedisct1

crypto-bigint now has a constant-time modpow implementation in the form of DynResidue::pow, however it uses Montgomery form internally so it may not be the fastest option for straight modpow since converting in/out of Montgomery form is costly, and it still monomorphizes around a fixed number of limbs.

For the purposes of rsa (and dsa) we probably need to add a new Uint type like UintVec which uses a number of limbs chosen at runtime as a proper num_bigint_dig::BigUint replacement.

Edit: work on crypto_bigint::BoxedUint has started.

Apr 25 '23 03:04 tarcieri

~~Until this is addressed it seems like something we should more prominently highlight in security-related documentation.~~

Edit: opened #373 (and merged!)

Sep 05 '23 21:09 tarcieri

Hi! :wave:

I've recently been working on RSA side-channel attacks, and found that many implementations are vulnerable. That's described in the Marvin Attack.

Thanks to help by @ueno, who contributed the test harness, I was able to run the test against rust-crypto (exact versions of all packages are in the PR).

Unfortunately I have to report that the side-channel leakage from the numerical library is very substantial and easily detectable over the network. As such, I consider RustCrypto RSA to be vulnerable and exploitable.

Test results from a run with 100k repeats per probe on an i9-12900KS @ 5.225GHz:

Sign test mean p-value: 0.2189, median p-value: 0.06354, min p-value: 3.946e-60
Friedman test (chisquare approximation) for all samples
p-value: 8.827139162990364e-108
Worst pair: 2(no_padding_48), 4(signature_padding_8)
Mean of differences: -8.07298e-07s, 95% CI: -9.06918e-07s, -7.019407e-07s (±1.025e-07s)
Median of differences: -8.44500e-07s, 95% CI: -9.78000e-07s, -7.200000e-07s (±1.290e-07s)
Trimmed mean (5%) of differences: -7.27307e-07s, 95% CI: -8.13648e-07s, -6.316737e-07s (±9.099e-08s)
Trimmed mean (25%) of differences: -8.48652e-07s, 95% CI: -9.17708e-07s, -7.769631e-07s (±7.037e-08s)
Trimmed mean (45%) of differences: -9.56301e-07s, 95% CI: -1.05768e-06s, -8.603802e-07s (±9.865e-08s)
Trimean of differences: -4.72000e-07s, 95% CI: -5.76750e-07s, -3.635625e-07s (±1.066e-07s)

pairwise results are in report.txt

confidence intervals for the measurements are as follows: conf_interval_plot_trim_mean_25

legend:

ID,Name
0,header_only
1,no_header_with_payload_48
2,no_padding_48
3,no_structure
4,signature_padding_8
5,valid_0
6,valid_48
7,valid_192
8,valid_246
9,valid_repeated_byte_payload_246_1
10,valid_repeated_byte_payload_246_255
11,zero_byte_in_padding_48_4

explanations of that are in the step2.py script in marvin-toolkit repo.

In other words, the protection from adding blinding is not sufficient, and Rust Crypto has at least the same issue as the CVE-2022-4304 in OpenSSL.

Nov 22 '23 15:11 tomato42

@tomato42 indeed that's using PKCS#1v1.5 without random blinding, so with modpow being non-constant-time, it's to be expected. We should definitely get rid of any APIs which permit private key use without random blinding.

Nov 22 '23 15:11 tarcieri

Also just a note for the future, per our SECURITY.md we would've appreciated an advisory for this opened under a private security disclosure.

Nov 22 '23 15:11 tarcieri

@tarcieri If the de-blinding isn't performed using constant-time code, then use of blinding won't remove the side-channel signal, that's what the bug in OpenSSL was about: removal of the blinder and conversion to the constant-length bytes needs to be side-channel free too.

Also just a note for the future, per our SECURITY.md we would've appreciated an advisory for this opened under a private security disclosure.

ah, sorry about that; since this was linked as a security issue, I've assumed that the effect of it on security of RSA decryption was assumed public too.

Nov 22 '23 15:11 tomato42

Our answer until now has been that the application of random blinding prevented such sidechannels, so the fact that isn't the case is news to us

Nov 22 '23 15:11 tarcieri

sorry, I'm confused, on one hand you say that the the privkey.decrypt(Pkcs1v15Encrypt, &buffer); won't use blinding, yet that's the public API that you say is protected by use of blinding in README.md...?

Nov 22 '23 15:11 tomato42

The README.md is wrong in that case.

(Also I'm just discovering this and haven't had time to look over any code yet to confirm specific details)

Nov 22 '23 15:11 tarcieri

ok, I'm still running test with a larger sample, to see if there aren't additional sources of leakage. If you have any additional questions, feel free to ask

Nov 22 '23 15:11 tomato42

I'd be curious if you saw similar sidechannels in non-PKCS#1v1.5 constructions like OAEP decryption or PSS signatures

Nov 22 '23 15:11 tarcieri

if you could propose a change on top of that PR that adds an OAEP decryption, I can run tests for that too (there are scripts in marvin-toolkit for testing OAEP too)

but since the leak is clearly from the numerical library, that means OAEP is also vulnerable, as that leak happens before any OAEP checks

even if PSS leaks like that, it doesn't make the implementation as easily exploitable, as both PKCS#1v1.5 an OAEP attacks are chosen ciphertext attacks, with PSS (or with PKCS#1 v1.5 signatures) the attacker can only reasonably affect the hash being signed, not the whole plaintext (in other words, it's a fundamentally different attack, so the test for it needs to be different too)

Nov 22 '23 16:11 tomato42

Can you confirm you see the sidechannel against the raw blinded modpow operation (when used with e.g. rand_core::OsRng)? That's really what I'm curious about.

Nov 22 '23 16:11 tarcieri

Sorry, I'm not a rust developer, you will need to hand-hold me (provide the complete code to execute) if you want me to run any tests

Nov 22 '23 16:11 tomato42

The branch containing that code has disappeared: https://github.com/ueno/marvin-toolkit/tree/wip/rust-crypto

You can test the low-level "hazmat" API: https://docs.rs/rsa/latest/rsa/hazmat/fn.rsa_decrypt.html

It would look something like:

https://github.com/tomato42/marvin-toolkit/pull/4/files#diff-c0f6b76f1d52cf643fb2f30ef85b81b20ace1aa7577d1f4603db0e091e052d66R50-R63

...replaced with:

    loop {
        let mut buffer = vec![0; args.size];
        let n = infile.read(&mut buffer)?;
        if n == 0 {
            break;
        }

        assert!(n == buffer.len());

        let now = Instant::now();
        let c = rsa::BigUint::from_bytes_be(&buffer);
        let _ =  rsa::hazmat::rsa_decrypt(Some(&mut rand_core::OsRng), &privkey, &c);
        writeln!(outfile, "{}", now.elapsed().as_nanos())?;
    }

Note you'll need to add rand_core as a dependency and enable the hazmat feature of rsa:

https://github.com/tomato42/marvin-toolkit/pull/4/files#diff-76856434fe4ad8f1f778ba0452c8d8e49e764d25adf1374d385634cd251e1ca2R6-R9

[dependencies]
anyhow = "1"
clap = { version = "4", features = ["derive"] }
rand_core = { version = "0.6", features = ["getrandom"] }
rsa = { version = "0.9", features = ["hazmat"] }

Nov 22 '23 16:11 tarcieri

(will test rsa_decrypt from hazmat later)

With 1 million observations per sample (same setup) I'm getting:

Sign test mean p-value: 0.09834, median p-value: 2.633e-08, min p-value: 0.0
Friedman test (chisquare approximation) for all samples
p-value: 0.0
Worst pair: 2(no_padding_48), 4(signature_padding_8)
Mean of differences: -2.06196e-06s, 95% CI: -2.09851e-06s, -2.022175e-06s (±3.817e-08s)
Median of differences: -3.22400e-06s, 95% CI: -3.24600e-06s, -3.205000e-06s (±2.050e-08s)
Trimmed mean (5%) of differences: -1.99461e-06s, 95% CI: -2.02364e-06s, -1.965729e-06s (±2.895e-08s)
Trimmed mean (25%) of differences: -2.08608e-06s, 95% CI: -2.10654e-06s, -2.067761e-06s (±1.939e-08s)
Trimmed mean (45%) of differences: -3.09282e-06s, 95% CI: -3.11547e-06s, -3.070944e-06s (±2.227e-08s)
Trimean of differences: -2.22769e-06s, 95% CI: -2.25800e-06s, -2.198494e-06s (±2.975e-08s

conf_interval_plot_trim_mean_25

and pairwise results: report.txt

Which show that the errors in padding checks also leak (compare valid_48 to valid_246: sign test p-value of 5.13e-7)

Nov 22 '23 18:11 tomato42

ran the tests against this:

        let c = rsa::BigUint::from_bytes_be(&buffer);
        let _ =  rsa::hazmat::rsa_decrypt(Some(&mut rand_core::OsRng), &privkey, &c);

and have confirmed that it's leaking too.

Same overall configuration but with 2.5 million observations per probe:

Sign test mean p-value: 0.1915, median p-value: 0.008017, min p-value: 1.428e-15
Friedman test (chisquare approximation) for all samples
p-value: 6.50448690941627e-27
Worst pair: 0(no_padding_48), 3(too_short_payload_0_1)
Mean of differences: -1.11105e-07s, 95% CI: -1.42260e-07s, -7.775684e-08s (±3.225e-08s)
Median of differences: -9.50000e-08s, 95% CI: -1.19000e-07s, -7.300000e-08s (±2.300e-08s)
Trimmed mean (5%) of differences: -9.89875e-08s, 95% CI: -1.25208e-07s, -7.384255e-08s (±2.568e-08s)
Trimmed mean (25%) of differences: -9.23689e-08s, 95% CI: -1.18946e-07s, -6.803432e-08s (±2.546e-08s)
Trimmed mean (45%) of differences: -9.30858e-08s, 95% CI: -1.17708e-07s, -7.039496e-08s (±2.366e-08s)
Trimean of differences: -9.47500e-08s, 95% CI: -1.20250e-07s, -7.056250e-08s (±2.484e-08s)

conf_interval_plot_trim_mean_45

Since for raw RSA padding doesn't matter I've executed a slightly different set of tests:

ID,Name
0,no_padding_48
1,no_structure
2,signature_padding_8
3,too_short_payload_0_1
4,too_short_payload_0_3
5,too_short_payload_0_7
6,too_short_payload_0_15
7,valid_0
8,valid_repeated_byte_payload_245_0

and pairwise statistical test results: report.txt

Nov 23 '23 01:11 tomato42

I'm not sure what solution we can provide here short of a fully constant time implementation (which was the planned long-term solution for this problem, using e.g. a Barrett reduction as provided by crypto-bigint, but that will require a substantial amount of work)

I guess I need to figure out how to run your reproducer.

Can you explain a bit more what those charts are showing? Are you actually able to recover keys?

Nov 23 '23 01:11 tarcieri

I'm not sure what solution we can provide here short of a fully constant time implementation

what's necessary, is ensuring that the unblinding happens in constant time with respect to the output; that means you need to convert the whatever arbitrary precision format is in use for the modulo exponentiation into constant length integers, then multiply them using real constant-time code, and finally do constant time reduction modulo. Something like what's in https://github.com/tomato42/ctmpi but in rust, not in C. Leakage in conversion from one format to the other is not a problem as that value is uncorrelated with the RSA plaintext, so leakage of it doesn't provide useful information to the attacker (as long as the blinding/unblinding factors remain secret).

That being said, if your mod_exp is leaky, you really have to employ both base blinding and exponent blinding. Details of that, as well as links to papers showing attacks against implementations that used just one kind of blinding are in the https://datatracker.ietf.org/doc/html/draft-kario-rsa-guidance-02 But to fix raw RSA implementation against Bleichenbacher/Marvin you just need to fix the last multiplication and reduction modulo.

Can you explain a bit more what those charts are showing?

they show the estimation of the difference in timing for different inputs. Like in the last one, no_padding_48 (probe 0) is processed about 83 ns slower than no_structure (probe 1): that's why it's marked 1-0 (or "probe 1 time subtract probe 0 time")

Are you actually able to recover keys?

those graphs show that it's possible, using time of the decryption, to execute one instance of the Bleichenbacher oracle. To decrypt a RSA ciphertext (or forge a signature) the attacker would need to run the test good few thousand times. See the details on the Marvin vulnerability page and in the associated papers. Since my goal is proving that there is no side-channel, not actually attacking implementations, I don't have good tooling to actually perform the attack (but then there's 25 years of literature on the topic, so once you have a working oracle, there are ready solutions for the rest).

Nov 23 '23 02:11 tomato42

Another run for rsa_decrypt, this time with 10.5 million observations per sample:

Sign test mean p-value: 0.1359, median p-value: 2.134e-07, min p-value: 1.433e-60
Friedman test (chisquare approximation) for all samples
p-value: 9.61782206869971e-128
Worst pair: 0(no_padding_48), 8(valid_repeated_byte_payload_245_0)
Mean of differences: -8.28395e-08s, 95% CI: -9.87044e-08s, -6.762672e-08s (±1.554e-08s)
Median of differences: -9.10000e-08s, 95% CI: -1.02000e-07s, -8.000000e-08s (±1.100e-08s)
Trimmed mean (5%) of differences: -8.62429e-08s, 95% CI: -9.87246e-08s, -7.492710e-08s (±1.190e-08s)
Trimmed mean (25%) of differences: -8.64124e-08s, 95% CI: -9.79358e-08s, -7.520178e-08s (±1.137e-08s)
Trimmed mean (45%) of differences: -9.07003e-08s, 95% CI: -1.00972e-07s, -7.931308e-08s (±1.083e-08s)
Trimean of differences: -9.07500e-08s, 95% CI: -1.02000e-07s, -8.025000e-08s (±1.087e-08s)

conf_interval_plot_trim_mean_45

and pairwise statistical results: report.txt

Which suggests that for the leakage to happen, the whole most significant word needs to be equal 0. This will make attacks against OAEP with 2048 bit keys rather impractical against 64 bit architectures. But it does mean that both 32 bit architectures, and less common key sizes, like 2049 or 2056 bit keys, are rather realistic to attack.

Note: I haven't excluded the possibility of leakage happening with probe 3 or probe 4 (16 and 32 most significant bits being zero respectively), but it does look exactly the same as the leakage pattern for CVE-2022-4304, where I did do that.

Nov 23 '23 12:11 tomato42

what's necessary, is ensuring that the unblinding happens in constant time with respect to the output; that means you need to convert the whatever arbitrary precision format is in use for the modulo exponentiation into constant length integers, then multiply them using real constant-time code, and finally do constant time reduction modulo. Something like what's in https://github.com/tomato42/ctmpi but in rust, not in C.

The core modpow operation in num-bigint-dig should already implemented in terms of Montgomery multiplication: https://github.com/dignifiedquire/num-bigint/blob/6f73f0a4025164325e85f8645dad6712b3110c51/src/monty.rs#L129

That alone is insufficient, however, per the OP. I'd generally worry that num-bigint is a general-purpose arbitrary precision bignum library that wasn't designed for constant-time operation.

I could potentially attempt to completely sidestep num-bigint for this, extracting the raw limbs out of the BigUints and performing modpow in code which has been carefully written to avoid any secret-dependent branching, similar to what we have already in crypto-bigint: https://github.com/RustCrypto/crypto-bigint/blob/a02c505/src/uint/modular/reduction.rs

Nov 23 '23 16:11 tarcieri

Rather than try to hunt down where timing sidechannels are occurring in num-bigint, I've been attempting to build out BoxedUint in crypto-bigint, which now has partial support for sharing the Montgomery multiplication implementation we already provide for the stack-allocated Residue and DynResidue types.

It's designed around a fixed (albeit dynamic) precision rather than arbitrary precision, and supports decoding BoxedUints with a precision specified a priori, allowing us to ensure that the number of limbs is constant and always equal to the limbs of the modulus. This should eliminate sidechannels due to fewer limbs because zeros are being stripped, which I suspect is what's happening per the earlier measurements.

If we go that route, it will be a somewhat invasive, breaking change, but also one we've been wanting to do for awhile (see #51). I'm open to alternative options which work with num-bigint-dig, but I'm not sure we'll actually solve this problem without switching to code which has been carefully written from the ground-up to operate in constant-time.

Nov 26 '23 23:11 tarcieri

I am fully in favor of finally switching 😅 thanks @tarcieri for pushing crypto-bigint forward!

Nov 27 '23 09:11 dignifiedquire

FWIW it looks like Go used to have similar problems due to their use of big.Int and solved the issue in a similar manner, by ripping out big.Int and replacing it with a carefully written internal cryptographic integer library: https://github.com/golang/go/commit/8a81fdf165facdcefa06531de5af98a4db343035

Nov 27 '23 14:11 tarcieri

I've opened a WIP @RustSec advisory here: https://github.com/rustsec/advisory-db/pull/1825

Nov 28 '23 04:11 tarcieri

@tarcieri I can probably spend some time helping on this, can you point me at the missing pieces/where you are currently at?

Nov 28 '23 07:11 dignifiedquire