Switch from custom ASM code to Nadeko
Are you suggesting using Nadeko just for the constant time comparison functions or as part of implementations of the algorithms?
I tried something "Nadeko-like" a few months ago. One of the things that's tricky to guarantee is that some of the algorithms that are written to be constant time actually are after LLVM finishes optimizing. Anyway, what I tried to do was to newtype the u32x4 type and to implement all of the operations on it via asm!() blocks to hide them from LLVM's optimizer. I then updated the "Aessafe" implementation to use this new type (if you aren't familiar with that, it is an AES implementation in Rust-Crypto implemented using bitslicing which in theory is constant time. In practice, with the last recent LLVM version that I tested, it is constant time, but, there is no guarantee that a new LLVM version won't break that). Anyway, I ran into a 30% (roughly, as I remember) performance penalty that I couldn't figure out how to get rid of. As best as I could tell, it was due to a few different factors:
- On X86, many instructions take two parameters - an input parameter and an in/out parameter. For instructions like
add, it doesn't matter if you doadd eax, ecxoradd ecx, eax- in both cases, you'll add the contents of those registers. However, it does matter for register scheduling purposes - ideally, you want to overwrite the register that won't be used again. However, there isn't any way to express in anasm!()block that LLVM can re-order the operands. As a result, you can end up with it saving and restoring registers needlessly. This can be worked around to an extent by carefully ordering the operands manually - however, this requires a bit of black magic since you need to guess what LLVM is going to do and then write code that hopefully complements it. Its hard to do and I don't know if its possible to do when taking multiple architectures into account. - LLVM's
asm!()block is a bit weaker than GCCs. In GCC, its possible to say that an operand can either be in memory or be a register. GCC will then try to use a register, and if it can't (or its more efficient), fall back to using a memory location. LLVM accepts the syntax. However, it just falls back to always using the operand from memory. As a result, LLVM will end up copying a register value into the stack before invoking anasm!()block if you use that syntax when it would be more efficient just to use the register directly. The only way to prevent his is to not specify that a memory location is acceptable. The result is that LLVM can't use memory locations directly even when they would be more efficient. - It looks to me like LLVM tries to emit operations in such a way as to maximize the available operational units on the CPU. So, if you have ADD, ADD, ADD, MUL, I think LLVM will try to re-order that to ADD, MUL, ADD, ADD if it can in order to do the MUL and ADD at the same time. Once all of the operations are turned into opaque
asm!()blocks, it can't do this any more. Again, this is something that can be worked around to an extent by re-ordering operations in Rust code, but, its a bit of black magic that can also make the code harder to read. - I think there was a 4th thing. I can't remember it now, though.
I don't know how much each of those issues contributed to the ~30% performance penalty that I saw. I gave up on the approach, though, since I couldn't figure out how to make it work.
Anyway, I'm not against using Nadeko, as long as:
- There isn't a performance penalty for doing so
- We continue to support all the architectures that we do right now (X86 / X86_64 / ARM). If the use is limited to just the constant time comparison functions, that would probably make it easier to do. Ideally, we would also have a way to support other architectures as well.
- We can use Cargo (it doesn't appear that Nadeko is currently Cargo packaged)
- We have a way to support Rust 1.0 (Nadeko currently uses
asm!()blocks, I believe, which will be unavailable with Rust 1.0).