Add configurable side channels mitigations; enable them on soft AES
Our software AES implementation doesn't have any mitigations against side channels.
Go's generic implementation is not protected at all either, and even OpenSSL only has minimal mitigations.
Full mitigations against cache-based attacks (bitslicing, fixslicing) come at a huge performance cost, making AES-based primitives pretty much useless for many applications. They also don't offer any protection against other classes of side channel attacks.
In practice, partially protected, or even unprotected implementations are not as bad as it sounds. Exploiting these side channels requires an attacker that is able to submit many plaintexts/ciphertexts and perform accurate measurements. Noisy measurements can still be exploited, but require a significant amount of attempts. Wether this is exploitable or not depends on the platform, application and the attacker's proximity.
So, some libraries made the choice of minimal mitigations and some use better mitigations in spite of the performance hit. It's a tradeoff (security vs performance), and there's no one-size-fits all implementation.
What applies to AES applies to other cryptographic primitives.
For example, RSA signatures are very sensible to fault attacks, regardless of them using the CRT or not. A mitigation is to verify every produced signature. That also comes with a performance cost. Wether to do it or not depends on wether fault attacks are part of the threat model or not.
Thanks to Zig's comptime, we can try to address these different requirements.
This PR adds a side_channels_protection global, that can later be complemented with fault_attacks_protection and possibly other knobs.
It can have 4 different values:
-
none: which doesn't enable additional mitigations. "Additional", because it only disables mitigations that don't have a big performance cost. For example, checking authentication tags will still be done in constant time. -
basic: which enables mitigations protecting against attacks in a common scenario, where an attacker doesn't have physical access to the device, cannot run arbitrary code on the same thread, and cannot conduct brute-force attacks without being throttled. -
medium: which enables additional mitigations, offering practical protection in a shared environement. -
full: which enables all the mitigations we have.
The tradeoff is that the more mitigations we enable, the bigger the performance hit will be. But this let applications choose what's best for their use case.
medium is the default.
Currently, this only affects software AES, but that setting can later be used by other primitives.
For AES, our implementation is a traditional table-based one, with 4 32-bit tables and a sbox.
Lookups in that table have been replaced by function calls. These functions can add a configurable noise level, making cache-based attacks more difficult to conduct.
In the none mitigation level, the behavior is exactly the same as before. Performance also remains the same.
In other levels, we compress the T tables into a single one, and read data from multiple cache lines (all of them in full mode), for all bytes in parallel. More precise measurements and way more attempts become necessary in order to find correlations.
In addition, we use distinct copies of the sbox for key expansion and encryption, so that they don't share the same L1 cache entries.
The best known attacks target the first two AES round, or the last one.
While future attacks may improve on this, AES achieves full diffusion after 4 rounds. So, we can relax the mitigations after that. This is what this implementation does, enabling mitigations again for the last two rounds.
In full mode, all the rounds are protected.
The protection assumes that lookups within a cache line are secret. The cachebleed attack showed that it can be circumvented, but that requires an attacker to be able to abuse hyperthreading and run code on the same core as the encryption, which is rarely a practical scenario.
Still, the current AES API allows us to transparently switch to using fixslicing/bitslicing later when the full mitigation level is enabled.
We will eventually need to run all the crypto tests with all these different settings. Currently, the different mitigation levels share the same code, so it may not be strictly necessary. But as soon as very different code paths (ex: for bitslicing) will be introduced, this is something we'll have to do.
We can probably find a better name than full for the highest level of mitigations. full could imply that applications are fully protected, while this is not the case. It just enables all the mitigations we have.
Would it make sense to allow configuration more fine grained control where side_channels_protection (and any other such option) allows selection of specific mitigations in addition to the levels? Maybe a union(enum) with the last being a struct where fields are named after mitigations and work as toggles?
A given mitigation doesn't necessarily have the same impact on all functions. It also doesn't have the same impact on all platforms.
Having different classes instead, for well-defined use cases, allows a single setting to be applied to everything in the standard library, no matter what individual mitigations are.
This also enables us to change/upgrade the mitigations later as long as they provide the same security, without breaking anything. If a mitigation turns out to be ineffective, it can be replaced or augmented. From an application perspective, all that has to be done is choose what level to use.
We'll try to match the promises of a mitigation level, and make changes accordingly if new attack vectors are later discovered.
This is also way simpler to use for applications.
There's also a broader plan behind this, which is to formally define these security targets, and use them in cryptographic specifications.
Also use little-endian representation for the lookup tables.
Reference AES code uses big-endian, but it was written when a significant amount of CPUs were still big-endian. It's not really the case any more, so we'd better optimize for little-endian.
I disagree with the setting being global. It does not properly model the situation. It's easy to come up with a use case that uses the same cryptographic algorithm in two different places, one that requires mitigations and one that does not.
In that case, picking the lowest common denominator would be reasonable.
What alternative would you suggest?
Adding a context parameter to all functions would be a major breaking change.
I'm putting my foot down on this one. No global to control mitigations.
It's OK to break the API. When a user creates an instance of a cryptographic function, they must choose to opt into mitigations or not. This could be done with a context parameter as you suggested, or it could be done via choosing a different API. For example std.crypto.aes (API remains unbroken) vs std.crypto.aes_mitigated (this one can have the fancy options).
I also don't see any value in the basic or medium enum tags here. If it's going to be a tradeoff, the user probably wants to be specific about what mitigations are being opted into or out of. I suggest that the mitigation configuration either starts with an empty set or full set and then exceptions are made for specific mitigations.
This is not specifically about AES. In fact, applications are very unlikely to ever use crypto.core.aes directly. The setting would affect the entire std.crypto. namespace, so adding a context would break all of it.
Having to add that parameter for every operation would also be a little bit cumbersome.
The types and/or representations won't necessarily be the same according to that setting. The representation of an AES block with bitslicing is incompatible with the same block without bitslicing, and the entire AES encryption state with fixslicing is also incompatible with both of them. So, having that setting on a per-function basis may cause bugs and confusion.
The basic and medium settings are what most applications should actually use. They represent the best. balance between speed and security. A formal definition, and clear guidance about when to use what level is still a work in progress. This work is going to be presented at HACS in Tokyo in March 2023, and having Zig as an actual implementation in a standard library would be nice.
In BoringSSL, this is going to be a compile-time flag. Maybe be could do the same, so that different libraries can use different settings?
Having applications choose the exact mitigations does not work, as the exact mitigations to meet a given level are arch-specific, and can change over time as new attacks and defenses are discovered.
In the meantime, and going back to soft AES specifically, we need to decide what to do by default. The status quo is an implementation that is fast, but leaks the key with the textbook cache attack against AES.
Systems without hardware AES acceleration are getting rare. Even cheap microcontrollers now have AES acceleration. But a major exception is WebAssembly.
@jedisct1 just to clarify, I meant the following:
const Level = union(enum) {
none,
basic,
medium,
full,
custom: Mitigations,
const level_none: Mitigations = .{};
const level_basic: Mitigations = .{
.example_a = true,
.example_b = true,
};
const level_medium: Mitigations = .{
.example_a = true,
.example_b = true,
.example_c = true,
};
const level_full: Mitigations = .{
.example_a = true,
.example_b = true,
.example_c = true,
.example_d = true,
};
pub const Mitigations = packed struct {
/// reasons why and warnings
example_a: bool = false,
example_b: bool = false,
example_c: bool = false,
example_d: bool = false,
};
pub fn get(self: Level) Mitigations {
return switch (self) {
.none => level_none,
.basic => level_basic,
.medium => level_medium,
.full => level_full,
.custom => |m| m,
};
}
};
@tauoverpi Here's why that wouldn't work, taking RSA as an example.
We can use std.math.big to implement RSA. However, std.math.big is not constant-time.
The common mitigation in that case is to use blinding: secrets are raised to some secret exponent, and then raised to its inverse after the computation is done.
So, Mitigations would have an .use_blinding member.
But wait. With our ECC implementations, arithmetic is constant-time. So using blinding here would not be useful. Mitigations should be algorithm-specific, and the member should then be:
.use_blinding_for_rsa
Another mitigation would be to implement a subset of std.math.big that is constant-time. This is the route currently being taken by Go and Rust, and there are no reasons not to eventually do the same thing in Zig. For applications, that should be transparent. Except that:
.use_blinding_for_rsa now does nothing. And we can't really remove it after 1.0 is tagged. But let's assume we can, and it gets replaced by:
.use_constant_time_arithmetic_for_rsa
Everything's great. But wait. The constant-time arithmetic code probably assumes that additions and multiplications of registers with a carry is a constant time operation. Unfortunately, this is not the case on some microcontrollers.
Not to worry, on these platforms .use_constant_time_arithmetic_for_rsa can use a different code path and representation, that uses bigger limbs with top bits always cleared in order to avoid carries.
But doing that may be slower than using the old std.math.big code with blinding. So we should anticipate and keep .use_constant_time_arithmetic_for_rsa. Until we manage to optimize the constant-time version so that it gets competitive with blinding.
My point is that allowing applications to choose the exact mitigations that are applied is not future-proof. This will prevent us from implementing new mitigations/removing old ones. And it is too demanding to application developers, that will have to understand a lot of details in order to get the guarantees they need.
@jedisct1 -
Given the fact that you have a vision for std.crypto, and have been executing that vision expertly and diligently, I'd like to withdraw my request and give you jurisdiction over these matters, at least for now.
The one caveat that I want to be clear about: we are still going to have a massive std lib audit before 1.0, and I do reserve the right to revisit this issue when that time comes. At that time we can discuss the matter and see all of the tradeoffs clearly, as well as have the benefit of wisdom from trying out one particular configuration.
At least for now, let's explore your vision. Please feel free to rebase and merge this branch.
please note this: #14181
Oh, right! Thanks for the reminder.
I'm going to move that.