Application hangs when GPU-VA is enabled
Environment:
- OS: Windows 11
- GPU and driver version: NVIDIA 552.44
- SDK or header version if building from repo: build from source (commit e74bfc215bf985d5cb515dcbd972b7b33611ba63)
- Options enabled (synchronization, best practices, etc.):
Core:
- Object in Use
- Shader (with Caching)
- Object Lifetime
- Stateless Parameter
GPU-VA with following settings:
- Reserve Descriptor Set Binding Slot
- Linear Memory Allocation Mode
- Descriptor and OOB Checks (with Generate warning on OOB accesses even if robustness is enabled)
- Validate RayQuery SPIR-V Instructions
Describe the Issue
Application hangs and found two threads are acquiring write lock on something.
Expected behavior
A clear and concise description of what you expected to happen.
Valid Usage ID If applicable, please include the validation messages encountered leading up to the issue
Additional context
code or terminal output
# callstacks, crashes, etc.
# EX:
Validation Error: [ VUID-vkCmdDrawMultiEXT-colorAttachmentCount-06188 ] Object 0: handle = 0x3d47e60 ...
@f32by is this something you can hit everytime, or is it hanging different each time?
@f32by is this something you can hit everytime, or is it hanging different each time?
It always hangs at the same position with GPU-VA enabled.
Hello @f32by Thank you for reporting this issue, in a way it happens to be the right time for it given what I am working on. I did not managed to reproduce this issue, and I am still not really familiar with how we approach multithreading in our code base so I need more info to make progress on that issue.
Would you be able to use a validation layer build from this branch please? https://github.com/arno-lunarg/Vulkan-ValidationLayers/tree/arno-gpu-fix-deadlock
I added some debug logs (printed in std::out) to help me track down what is happening in your case. If you could show me the logs you get, it would be very helpful.
(If need be I can output those logs to a dedicated file)
We want to figure out this, but without a way to reproduce it is not possible
we have made a lot of improvements in the last 2 months and not sure this still occurs in the 1.3.290 SDK
If problems persists, please open a new issue and we can try to take another look