mach icon indicating copy to clipboard operation
mach copied to clipboard

sysgpu: Examples that allocate against vulkan backend crash with out of memory error

Open shailpatels opened this issue 11 months ago • 2 comments

System details: zig version: 0.14.0-dev.2577+271452d22 mach: b14f8e69ee8eb834695eb0d0582053e555d10156 Vulkan driver version: 53.308.0 NVIDIA GeForce RTX 3080 OS: NixOs 24.11.7

steps to reproduce Attempt to build an example that allocates memory (zig build run-[glyphs,hardware-check,sprite,etc])

See the following stacktrace:

error(mach): Server Side Decorations aren't supported

Falling back to X11

info(mach): found Vulkan backend on Discrete GPU adapter: NVIDIA GeForce RTX 3080, Vulkan driver version 53.308.0

warning(mach): You are using the wayland backend, which is currently experimental as we continue to rewrite Mach in Zig instead of using C libraries like GLFW/etc. The following features are expected to not work:

* Resizing window
* Changing display mode
* VSync
* Setting window border/cursor

Contributions welcome!

thread 26649 panic: api error
/home/shail/.cache/zig/p/12208333c8b3551908b66b8a421e7127bdf0c1806063836576283860eff99c99c521/vk.zig:29926:54: 0x123a511 in allocateMemory (glyphs)
                Result.error_out_of_device_memory => return error.OutOfDeviceMemory,
                                                     ^
/home/shail/GitHub/mach/src/sysgpu/vulkan.zig:1191:24: 0x123c348 in init (glyphs)
        const memory = try vkd.allocateMemory(vk_device, &.{
                       ^
/home/shail/GitHub/mach/src/sysgpu/vulkan.zig:943:28: 0x123d7ab in acquire (glyphs)
            const buffer = try Buffer.init(device, &.{
                           ^
/home/shail/GitHub/mach/src/sysgpu/vulkan.zig:2345:28: 0x123db60 in upload (glyphs)
            const buffer = try streaming_manager.acquire();
                           ^
/home/shail/GitHub/mach/src/sysgpu/vulkan.zig:2664:24: 0x12d0775 in writeBuffer (glyphs)
        const stream = try encoder.command_buffer.upload(size);
                       ^
/home/shail/GitHub/mach/src/sysgpu/main.zig:317:88: 0x123837f in renderPipeline (glyphs)
        command_encoder.writeBuffer(buffer, buffer_offset, @ptrCast(data), size) catch @panic("api error");
                                                                                       ^
/home/shail/GitHub/mach/src/gfx/Sprite.zig:205:23: 0x122e5c8 in tick (glyphs)
        renderPipeline(sprite, core, pipeline_id);
                      ^
/home/shail/GitHub/mach/src/module.zig:762:29: 0x11fd776 in run__anon_67657 (glyphs)
                    switch (@typeInfo(Ret)) {
                            ^
/home/shail/GitHub/mach/src/module.zig:785:57: 0x11cedb5 in callDynamic (glyphs)
                        inline else => |fn_name| mod.run(fn_name),
                                                        ^
/home/shail/GitHub/mach/src/module.zig:748:61: 0x118e595 in run (glyphs)
                                        modules2.callDynamic(fn_id);
                                                            ^
/home/shail/GitHub/mach/src/module.zig:502:19: 0x12dbf29 in run (glyphs)
            r._run(r._ctx, fn_id);
                  ^
/home/shail/GitHub/mach/src/module.zig:507:18: 0x125593c in call__anon_76619 (glyphs)
            r.run(fn_id);
                 ^
/home/shail/GitHub/mach/examples/glyphs/App.zig:318:20: 0x125493f in tick (glyphs)
    sprite_mod.call(.tick);
                   ^
/home/shail/GitHub/mach/src/module.zig:762:29: 0x11fdaac in run__anon_67661 (glyphs)
                    switch (@typeInfo(Ret)) {
                            ^
/home/shail/GitHub/mach/src/module.zig:785:57: 0x11cefa0 in callDynamic (glyphs)
                        inline else => |fn_name| mod.run(fn_name),
                                                        ^
/home/shail/GitHub/mach/src/module.zig:748:61: 0x118e595 in run (glyphs)
                                        modules2.callDynamic(fn_id);
                                                            ^
/home/shail/GitHub/mach/src/module.zig:502:19: 0x11b4709 in run (glyphs)
            r._run(r._ctx, fn_id);
                  ^
/home/shail/GitHub/mach/src/Core.zig:259:17: 0x11b4672 in main (glyphs)
    core_mod.run(core.on_tick.?);
                ^
/home/shail/GitHub/mach/src/module.zig:762:29: 0x116c63f in run__anon_10172 (glyphs)
                    switch (@typeInfo(Ret)) {
                            ^
/home/shail/GitHub/mach/src/module.zig:723:40: 0x1166107 in run__anon_6929 (glyphs)
                            callMod.run(callFn);
                                       ^
/home/shail/GitHub/mach/src/entrypoint/main.zig:13:12: 0x116521a in main (glyphs)
    app.run(.main);
           ^
/home/shail/Apps/zig-linux-x86_64-0.14.0-dev.2577+271452d22/lib/std/start.zig:656:37: 0x11664ce in main (glyphs)
            const result = root.main() catch |err| {
                                    ^
???:?:?: 0x7fe4e1a9e27d in ??? (libc.so.6)
Unwind information for `libc.so.6:0x7fe4e1a9e27d` was not available, trace may be incomplete

run-glyphs
└─ run glyphs failure
error: the following command terminated unexpectedly:
/home/shail/GitHub/mach/zig-out/bin/glyphs 

Notes

I added some debug logging and it appears the function findBestAllocator https://github.com/hexops/mach/blob/main/src/sysgpu/vulkan.zig#L3545 was always returning 1 (MemoryKind of linear) however when I check out https://github.com/hexops/mach/pull/1349/commits/e4f09e94fa9e7f9b2e37bd04f456c07edd783e13 from the libdecor PR and force the allocator to be 3 (linear_write_mappable), the glyphs and hardware check demos work for me.

(I checked out the libdecor branch to try out wayland otherwise I can't run x11 it throws an out of date khr error with forcing it with the alternative allocator index)

I'm printing out the size it attempts to allocate and its pretty small (65mb at most):

info(vulkan): got mem type index of linear
info(vulkan): attempting to allocate 67108864

info(vulkan): got mem type index of linear
info(vulkan): attempting to allocate 33554432

info(vulkan): got mem type index of linear
info(vulkan): attempting to allocate 33554432

info(vulkan): got mem type index of linear
info(vulkan): attempting to allocate 4194304

info(vulkan): got mem type index of linear
info(vulkan): attempting to allocate 128

info(vulkan): got mem type index of linear
info(vulkan): attempting to allocate 67108864

info(vulkan): got mem type index of linear
info(vulkan): attempting to allocate 67108864

thread 33431 panic: api error
/home/shail/.cache/zig/p/12208333c8b3551908b66b8a421e7127bdf0c1806063836576283860eff99c99c521/vk.zig:29926:54: 0x123a911 in allocateMemory (glyphs)
                Result.error_out_of_device_memory => return error.OutOfDeviceMemory,

So doesn't seem like its an actual OOM issue, maybe the findBestAllocator function isn't returning the correct one?

Interestingly the core-triangle example doesn't error out and doesn't hit my debug logging so I guess it doesn't allocate?

shailpatels avatar Feb 25 '25 02:02 shailpatels

You're not alone here at least :) https://discord.com/channels/996677443681267802/996677444360736831/1328445547748528231

emidoots avatar Feb 25 '25 06:02 emidoots

Well glad its not just me then! Will try and poke around with the vulkan backend, probably some 'fun' interaction between vulkan and nvidia causing issues :)

shailpatels avatar Feb 25 '25 14:02 shailpatels