Synchronization validation error on adjacent (VMA) buffers when drawing with stride
Hi,
I’m getting a synchronization validation error that I’d like to understand better and fix properly, with a small chance of an inaccuracy in the validation layer itself. I have 2 (vertex) buffers created with VMA, and checking the VmaAllocationInfo, they end up to be adjacent to each other into the same deviceMemory:
A) deviceMemory: 0x0000060000000006 offset: 1767168 size: 3072
B) deviceMemory: 0x0000060000000006 offset: 1770240 size: 3072
For these buffers, VkUsage is
VK_BUFFER_USAGE_VERTEX_BUFFER_BIT | VK_BUFFER_USAGE_STORAGE_BUFFER_BIT | VK_BUFFER_USAGE_TRANSFER_DST_BIT | VK_BUFFER_USAGE_TRANSFER_SRC_BIT | VK_BUFFER_USAGE_SHADER_DEVICE_ADDRESS_BIT | VK_BUFFER_USAGE_ACCELERATION_STRUCTURE_BUILD_INPUT_READ_ONLY_BIT_KHR
VmaMemoryUsage is just VMA_MEMORY_USAGE_AUTO_PREFER_DEVICE and VmaAllocationCreateFlags is just VMA_ALLOCATION_CREATE_USER_DATA_COPY_STRING_BIT (probably irrelevant).
I fill buffer A (using a staging buffer and vkCmdCopyBuffer) and then use it for drawing (with stride) without any issues. Then I’m trying to do exactly the same with buffer B, but when recording its vkCmdCopyBuffer, I get the following error:
vkCmdCopyBuffer(): WRITE_AFTER_READ hazard detected. vkCmdCopyBuffer writes to VkBuffer 0x19750000001975, which was previously read by vkCmdDraw. No sufficient synchronization is present to ensure that a write (VK_ACCESS_2_TRANSFER_WRITE_BIT) at VK_PIPELINE_STAGE_2_COPY_BIT does not conflict with a prior read (VK_ACCESS_2_VERTEX_ATTRIBUTE_READ_BIT) at VK_PIPELINE_STAGE_2_VERTEX_ATTRIBUTE_INPUT_BIT. Vulkan insight: an execution dependency is sufficient to prevent this hazard.
However, it’s the first time I’m touching buffer B and only buffer A was used for drawing. I have a basic tracking system for accesses and stages, so I do put a barrier before copying data to B (with dstStage = VK_PIPELINE_STAGE_2_COPY_BIT_KHR and dstAccess = VK_ACCESS_2_TRANSFER_WRITE_BIT_KHR), but since this is its first usage, the srcStage & srcAccess are both 0.
The offsets and sizes in the VkBufferMemoryBarrier seem to be correct for each buffer. Also, calling vkGetBufferMemoryRequirements shows that the alignment is 256 bytes, and I don’t see anything wrong with that.
A few words about my data: These buffers have 251 vertices with only position data, so my actual data size is 251 * 3 * 4 = 3012 bytes for each, and due to the 256 bytes alignment the total buffer size that VMA allocates is 3072 (256 * 12). My intention is to draw 1 vertex every 25, so there are 11 vertices that can be drawn in that buffer. Their indices in the buffer are 0, 25, 50, 75, 100, 125, 150, 175, 200, 225, 250 (last vertex in the buffer).
I noticed then, that the validation error occurs only when I use the calculated stride to draw these 11 vertices. That stride is 300 bytes (25 * 12), compared to the default of 12 bytes. If I draw the whole buffer with the default stride, then no validation error occurs.
I also tried vmaCreateBufferWithAlignment() with different alignment values to test:
512 & 1024 : Still getting the validation error.
2048 & 4096 : No error.
Also by using the VMA_ALLOCATION_CREATE_DEDICATED_MEMORY_BIT in the VmaAllocationCreateFlags, then again no validation error occurs. I guess because with this the 2 buffers use a different deviceMemory.
I understand the validation error, but is this to be expected? I probably shouldn't have to put an execution dependency to a buffer I’ve never used before, and it seems that buffer A’s memory “state” is affecting B somehow, but I don’t know how I can debug further. To be honest, I feel very skeptical about the last vertex. This is the last piece of data in the buffer and there is no place for the “stride” of the last vertex after it. Because 11 vertices * 300 bytes stride each is a perceived total of 3300 bytes, which exceeds the buffer size of 3072 and could be indeed touching the memory of the next buffer which is adjacent (right after the 3072 bytes that the first occupies). But why try to transition 3300 bytes? This thought sounds a bit naive. Unless there is somewhere a spec definition/assumption that every piece of data must have the appropriate stride after it and explains it precisely, but I can’t find it.
I'm working on Windows 11, reproduced on a few nvidia GPUs with pretty recent drivers, and the latest Vulkan SDK.
Thanks in advance for any help & insights!
Thanks for the report @gkarpa . Is it an open source app so I can check it?
Thank you for the quick response. Unfortunately no...Could a gfxreconstruct be of any help? Or if there is any other way I can help, please let me know.
Yes capture can be helpful (especially if it reproduces the issue), preferably captured on nvidia, but I have some integrated amd too. API dump can help too.
I'm recording a gfxreconstruct file (running a debug build of the app) but when I'm trying to replay it locally I get this fatal error almost at the beginning:
[gfxrecon] FATAL - API call at index: 666 thread: 1 vkAllocateMemory returned error value VK_ERROR_INVALID_OPAQUE_CAPTURE_ADDRESS that does not match the result from the capture file: VK_SUCCESS. Replay cannot continue. Replay has encountered a fatal error and cannot continue: A buffer creation or memory allocation failed because the requested address is not available. A shader group handle assignment failed because the requested shader group handle information is no longer valid.
Please note that I also had to disable the VK_EXT_validation_features extension to do the capture (which is what contains the synchronization validation as far as I know), otherwise the replayer was again failing even earlier with it (during the instance creation).
I have to note however, that a capture won't show anything useful visually. If you feel it will be useful anyway, and have any tips on overcoming this error, feel free to share. Thanks!
@gkarpa I will check details of this issue somewhere this week
I'm attaching the gfxr file as well since I managed to replay it with -m rebind.
Looking into this... I checked the capture and can replay it. I have syncval errors with -m rebind and no errors if I don't specify that option (I guess this makes sense since the program uses buffer device addresses).
I have another laptop with an RTX 3070 and the capture replays without mentioning any errors with and without -m rebind . I'm not really familiar with gfxreconstruct so I can't explain this discrepancy. It mentions that the replay adjusted some structs' array count (mostly about queue family properties and surface formats & present modes) but it is INFO level so I consider them unimportant. The initial validation error is consistently reproduced through the app in both devices.
If I use --validate when replaying then I'm getting some errors about the image extent of the swapchain creation and a signaled semaphore that may still be in use during presentation, but these seem irrelevant to the initial issue and don't happen when running the app.
I'm not really familiar with gfxreconstruct so I can't explain this discrepancy. I
I think it is about device buffer addresses being session specific and if they are stored in the capture they will be invalid on a different machine, and as I understand it -m rebind regenerates buffer device addresses based on allocations made on the replay side.
I also have 3070 laptop, can check there later too, but -m rebind should be fine for development.
@gkarpa First of all thanks for this very detailed reported, it saved me a lot of time on investigation. It's a bug in our code. It's true, stride is only about the distance between elements and does not create some stride-enabled-safe-region after the last element. In some parts of syncval we already fixed similar issues but this use case found one more place. The fix should be ready till the end of the day.
Thank you @artem-lunarg for the fast investigation & resolution, this is relieving to hear. Glad I could help and also thanks for your great work overall, keep it up!