sysgpu: Examples that allocate against vulkan backend crash with out of memory error
System details: zig version: 0.14.0-dev.2577+271452d22 mach: b14f8e69ee8eb834695eb0d0582053e555d10156 Vulkan driver version: 53.308.0 NVIDIA GeForce RTX 3080 OS: NixOs 24.11.7
steps to reproduce
Attempt to build an example that allocates memory (zig build run-[glyphs,hardware-check,sprite,etc])
See the following stacktrace:
error(mach): Server Side Decorations aren't supported
Falling back to X11
info(mach): found Vulkan backend on Discrete GPU adapter: NVIDIA GeForce RTX 3080, Vulkan driver version 53.308.0
warning(mach): You are using the wayland backend, which is currently experimental as we continue to rewrite Mach in Zig instead of using C libraries like GLFW/etc. The following features are expected to not work:
* Resizing window
* Changing display mode
* VSync
* Setting window border/cursor
Contributions welcome!
thread 26649 panic: api error
/home/shail/.cache/zig/p/12208333c8b3551908b66b8a421e7127bdf0c1806063836576283860eff99c99c521/vk.zig:29926:54: 0x123a511 in allocateMemory (glyphs)
Result.error_out_of_device_memory => return error.OutOfDeviceMemory,
^
/home/shail/GitHub/mach/src/sysgpu/vulkan.zig:1191:24: 0x123c348 in init (glyphs)
const memory = try vkd.allocateMemory(vk_device, &.{
^
/home/shail/GitHub/mach/src/sysgpu/vulkan.zig:943:28: 0x123d7ab in acquire (glyphs)
const buffer = try Buffer.init(device, &.{
^
/home/shail/GitHub/mach/src/sysgpu/vulkan.zig:2345:28: 0x123db60 in upload (glyphs)
const buffer = try streaming_manager.acquire();
^
/home/shail/GitHub/mach/src/sysgpu/vulkan.zig:2664:24: 0x12d0775 in writeBuffer (glyphs)
const stream = try encoder.command_buffer.upload(size);
^
/home/shail/GitHub/mach/src/sysgpu/main.zig:317:88: 0x123837f in renderPipeline (glyphs)
command_encoder.writeBuffer(buffer, buffer_offset, @ptrCast(data), size) catch @panic("api error");
^
/home/shail/GitHub/mach/src/gfx/Sprite.zig:205:23: 0x122e5c8 in tick (glyphs)
renderPipeline(sprite, core, pipeline_id);
^
/home/shail/GitHub/mach/src/module.zig:762:29: 0x11fd776 in run__anon_67657 (glyphs)
switch (@typeInfo(Ret)) {
^
/home/shail/GitHub/mach/src/module.zig:785:57: 0x11cedb5 in callDynamic (glyphs)
inline else => |fn_name| mod.run(fn_name),
^
/home/shail/GitHub/mach/src/module.zig:748:61: 0x118e595 in run (glyphs)
modules2.callDynamic(fn_id);
^
/home/shail/GitHub/mach/src/module.zig:502:19: 0x12dbf29 in run (glyphs)
r._run(r._ctx, fn_id);
^
/home/shail/GitHub/mach/src/module.zig:507:18: 0x125593c in call__anon_76619 (glyphs)
r.run(fn_id);
^
/home/shail/GitHub/mach/examples/glyphs/App.zig:318:20: 0x125493f in tick (glyphs)
sprite_mod.call(.tick);
^
/home/shail/GitHub/mach/src/module.zig:762:29: 0x11fdaac in run__anon_67661 (glyphs)
switch (@typeInfo(Ret)) {
^
/home/shail/GitHub/mach/src/module.zig:785:57: 0x11cefa0 in callDynamic (glyphs)
inline else => |fn_name| mod.run(fn_name),
^
/home/shail/GitHub/mach/src/module.zig:748:61: 0x118e595 in run (glyphs)
modules2.callDynamic(fn_id);
^
/home/shail/GitHub/mach/src/module.zig:502:19: 0x11b4709 in run (glyphs)
r._run(r._ctx, fn_id);
^
/home/shail/GitHub/mach/src/Core.zig:259:17: 0x11b4672 in main (glyphs)
core_mod.run(core.on_tick.?);
^
/home/shail/GitHub/mach/src/module.zig:762:29: 0x116c63f in run__anon_10172 (glyphs)
switch (@typeInfo(Ret)) {
^
/home/shail/GitHub/mach/src/module.zig:723:40: 0x1166107 in run__anon_6929 (glyphs)
callMod.run(callFn);
^
/home/shail/GitHub/mach/src/entrypoint/main.zig:13:12: 0x116521a in main (glyphs)
app.run(.main);
^
/home/shail/Apps/zig-linux-x86_64-0.14.0-dev.2577+271452d22/lib/std/start.zig:656:37: 0x11664ce in main (glyphs)
const result = root.main() catch |err| {
^
???:?:?: 0x7fe4e1a9e27d in ??? (libc.so.6)
Unwind information for `libc.so.6:0x7fe4e1a9e27d` was not available, trace may be incomplete
run-glyphs
└─ run glyphs failure
error: the following command terminated unexpectedly:
/home/shail/GitHub/mach/zig-out/bin/glyphs
Notes
I added some debug logging and it appears the function findBestAllocator https://github.com/hexops/mach/blob/main/src/sysgpu/vulkan.zig#L3545 was always returning 1 (MemoryKind of linear) however when I check out https://github.com/hexops/mach/pull/1349/commits/e4f09e94fa9e7f9b2e37bd04f456c07edd783e13 from the libdecor PR and force the allocator to be 3 (linear_write_mappable), the glyphs and hardware check demos work for me.
(I checked out the libdecor branch to try out wayland otherwise I can't run x11 it throws an out of date khr error with forcing it with the alternative allocator index)
I'm printing out the size it attempts to allocate and its pretty small (65mb at most):
info(vulkan): got mem type index of linear
info(vulkan): attempting to allocate 67108864
info(vulkan): got mem type index of linear
info(vulkan): attempting to allocate 33554432
info(vulkan): got mem type index of linear
info(vulkan): attempting to allocate 33554432
info(vulkan): got mem type index of linear
info(vulkan): attempting to allocate 4194304
info(vulkan): got mem type index of linear
info(vulkan): attempting to allocate 128
info(vulkan): got mem type index of linear
info(vulkan): attempting to allocate 67108864
info(vulkan): got mem type index of linear
info(vulkan): attempting to allocate 67108864
thread 33431 panic: api error
/home/shail/.cache/zig/p/12208333c8b3551908b66b8a421e7127bdf0c1806063836576283860eff99c99c521/vk.zig:29926:54: 0x123a911 in allocateMemory (glyphs)
Result.error_out_of_device_memory => return error.OutOfDeviceMemory,
So doesn't seem like its an actual OOM issue, maybe the findBestAllocator function isn't returning the correct one?
Interestingly the core-triangle example doesn't error out and doesn't hit my debug logging so I guess it doesn't allocate?
You're not alone here at least :) https://discord.com/channels/996677443681267802/996677444360736831/1328445547748528231
Well glad its not just me then! Will try and poke around with the vulkan backend, probably some 'fun' interaction between vulkan and nvidia causing issues :)