uom icon indicating copy to clipboard operation
uom copied to clipboard

Quantity::new syntax is not zero-cost on integer base units

Open Chippit opened this issue 3 years ago • 3 comments

Given the following test code, I expect all of these functions to generate the same ASM. However, when using base units i32 and i16 (the two I have tested), the third function generates thousands more ASM instructions over the first two (which are identical).

pub fn test_i32(a: i32, b: i32) -> i32 {
    let power_1 = a;
    let power_2 = b;

    power_1 - power_2
}

pub fn test_struct_i32(a: i32, b: i32) -> i32 {
    let power_1 = uom::si::i32::Power {
        dimension: core::marker::PhantomData,
        units: core::marker::PhantomData,
        value: a
    };
    let power_2 = uom::si::i32::Power {
        dimension: core::marker::PhantomData,
        units: core::marker::PhantomData,
        value: b
    };

    (power_1 - power_2).value
}

pub fn test_new_i32(a: i32, b: i32) -> i32 {
    let power_1 = uom::si::i32::Power::new::<uom::si::power::watt>(a);
    let power_2 = uom::si::i32::Power::new::<uom::si::power::watt>(b);

    (power_1 - power_2).value
}

These three functions produce the same output asm when using the f32 base type.

Cargo.toml

[dependencies.uom]
version = "0.33.0"
default-features = false
features = [
    "i16", "f32", "i32",
    "si"
]
$ rustc --version
rustc 1.62.1 (e092d0b6b 2022-07-16)

Chippit avatar Aug 22 '22 15:08 Chippit

My guess is that the to_base call in new() isn't being optimized out even when the given unit matches the base unit. Non-floating point underlying storage types haven't gotten the same attention that floating point types have. See #261 where I would like to add tests to ensure zero-cost code gets generated. At the time that issue was created code generation for f32 worked but I never tested other types.

https://github.com/iliekturtles/uom/blob/10d9679193775aae04987338a31014524bd5674d/src/quantity.rs#L208-L223

iliekturtles avatar Aug 23 '22 13:08 iliekturtles

I think the same thing might also explain other behaviour that I saw (which I didn't log as an issue because I wasn't entirely confident it wasn't expected in that case):

When the autoconvert feature is enabled, Eq implementations on u16 units were also resulting in significant cost. In my application I didn't actually need autoconvert so didn't dwell on it, but I suspect to_base is the culprit there too, from a brief peek into the source.

Chippit avatar Aug 23 '22 13:08 Chippit

Hmm, I have just got the same thing. I use uom in embedded calculations on ESP32. I am using i32 as a base type, and Quantity<...>::new execution was just very slow, comparing to using raw i32. I was using opt-level = 3, but it seems, that to_base was not optimized out by the compiler.

GamePad64 avatar Feb 07 '24 21:02 GamePad64