rust-gpu icon indicating copy to clipboard operation
rust-gpu copied to clipboard

The Vec3/Vec3A byte size conundrum

Open hrydgard opened this issue 5 years ago • 6 comments

So, we currently have the Vec3 and Vec3A types from glam available.

Vec3 is simply three scalar float32 values, and Vec3A is a SIMD type where an actual vector with three floats can fit.

On CPU, this results in:

  • Vec3 is (f32, f32, f32), 12 bytes
  • Vec3A is NEON or SIMD register, 16 bytes

On GPU/SPIR-V, we have: (EDITED)

  • Vec3/Vec3A are both a SPIR-V vec3 value (TypeVector x(float) 3). 12 bytes (but with special alignment requirements depending on buffer type/layout mode).
    • Having this type (and not just scalar) is required as an argument for some texture sampling functions, etc

The biggest issue I have with this is of course that on CPU a Vec3A occupies 16 bytes while on GPU it may occupy 12 or 16 bytes depending on surrounding declaration and buffer layout mode.

This all feels quite unsatisfactory, but I am not sure what the best course of action to clean this up would be.

hrydgard avatar Jan 13 '21 12:01 hrydgard

As of https://github.com/bitshifter/glam-rs/pull/106 (early december), Vec3 and Vec3A are identical on the GPU, both are TypeVector.

khyperia avatar Jan 13 '21 12:01 khyperia

The Vulkan spec requires Vec3 to be align_of<Vec4>() but the spare 4 bytes are usable, expressed like so in CPU code:

#[repr(align(16))]
#[repr(C)]
struct Foo {
  v: Vec3, // align_of<Vec3>() == 4 on the CPU, 16 on GPU with VK, 4 on GPU with VK_EXT_scalar_layout
  f: f32,
}

But the spirv from that is:

OpMemberDecorate %Foo 0 Offset 0
OpMemberDecorate %Foo 1 Offset 16    // This Offset could be 12, and size_of::<Foo>() could be 16 instead of 32
%float = OpTypeFloat 32
%v3float = OpTypeVecctor %float 3
%Foo = OpTypeStruct %v3float %float

because Vec3 needs to be #[repr(simd)] so it's represented as an OpTypeVector, which makes it align(16), size is an integer multiple of alignment, and offset is the sum of the sizes of the previous members.

SPIR-V itself does not define a struct representation, punting it to the application API, so there are other potential layouts to further complicate things. Vulkan has a couple extensions (eg. VK_EXT_scalar_layout) that change the layout rules, openGL has std150 and std430, openCL does it's own thing.

Ideally, the type definitions for CPU and GPU can be the same code, without needing to fiddle with #[cfg_attr(target_arch= ..., repr(...)] everywhere. For that to work, the layout needs to be the same across different compilation units using different codegen backends, like #[repr(C)] but with the layout guarantees set by some crate level flag (maybe hijack one of the unused target-triple flags?). This #[repr(spirv)] would need to be aware of OpTypeVector members to handle them properly, which may require changing how OpTypeVector is produced (iirc eddyb and khyperia were discussing LLVM being able to have size 12 align 16 vector types, so hopefully not).

I mentioned this on Discord the other day, but it was very early so I didn't really explain it.

e: Another wrinkle is Mat3, which has the alignment of 3 Vec3s, meaning it has 3 size 4, align 4 padding slots that could be filled:

#[repr(spirv)]
struct Foo {
    m: Mat3,
    a: f32, b: f32, c: f32,
}

could hopefully be

[[m.x_axis.xyz, a],
[m.y_axis.xyz, b],
[m.z_axis.xyz, c]],

Hentropy avatar Jan 15 '21 16:01 Hentropy

The Vulkan spec requires Vec3 to be align_of<Vec4>() but the spare 4 bytes are usable, expressed like so in CPU code:

#[repr(align(16))]
#[repr(C)]
struct Foo {
  v: Vec3, // align_of<Vec3>() == 4 on the CPU, 16 on GPU with VK, 4 on GPU with VK_EXT_scalar_layout
  f: f32,
}

This is only true for Vec3s within descriptors (uniform buffers etc). For Vec3s from a vertex buffer, it needs an align(4) 3 component vector type

fu5ha avatar Jan 26 '21 15:01 fu5ha

I have been trying and failing to do this for a little while now: I want a type that on the CPU is represented as

#[repr(align(16))]
#[repr(C)]
struct Vec3 {
    x: f32,
    y: f32,
    z: f32
}

and that maintains that layout on the GPU, so that I can use the same type on both ends of a storage/uniform buffer. To my understanding, none of the glam types have this representation on the CPU and writing this directly fails in spirv as #[repr(simd)] is necessary for copying to work. I could try something like

#[cfg_attr(not(target_arch = "spirv"), repr(align(16)), repr(C))]
#[cfg_attr(target_arch = "spirv", repr(simd))]
struct Vec3 {
    x: f32,
    y: f32,
    z: f32
}

but I think that bumps the size up to 16 bytes in spirv, preventing use of the spare bytes and leaving me at square one.

This is clearly possible somehow, as this align=16, size=12 is how a vec3 behaves in GLSL. Is what I am trying to do possible in rust-gpu and if not, what is blocking it?

lokegustafsson avatar Apr 29 '21 08:04 lokegustafsson

This is clearly possible somehow, as this align=16, size=12 is how a vec3 behaves in GLSL. Is what I am trying to do possible in rust-gpu and if not, what is blocking it?

The thing blocking it is that rustc (and tbh, a huge chunk of CPU-side compiler toolchains in general) does not support the concept of align=16 and size=12 - a assumption baked into a lot of code is that "size is always a multiple of alignment". We would need to make rust (and rustc) support that style of layout before we're able to implement it in rust-gpu.


Thinking about this further, the best path forward, for now, is probably a validation pass in rust-gpu that looks at all the interface parameters and the like, computes std140/std430 layout, and gives a warning or error if there's a mismatch, so the user can known to use Vec4 instead of Vec3 or whatever. However, more investigation is needed because rust-gpu should be emitting Offset decorations to every struct it emits, with rules of align=16 size=16 for Vec3, and the wacky size=12 rule shouldn't be coming into play anywhere, but it clearly is somehow because people are hitting bugs when they use Vec3. So, maybe a validation pass isn't the right way to go, or maybe it is, more investigation is needed on what exactly is causing issues when using Vec3. (Maybe the CPU side is assuming size=12 and mangling structs to fit that, even though the original data is size=16 and the GPU side is also size=16?)

khyperia avatar May 17 '21 08:05 khyperia

ah, https://github.com/EmbarkStudios/rust-gpu/issues/575 is probably a better, more thought-out way of doing this - so linking these issues together

khyperia avatar May 17 '21 08:05 khyperia

Closing this in favor of #575

oisyn avatar Nov 17 '22 22:11 oisyn