False positive image layout mismatch for VK_DESCRIPTOR_BINDING_UPDATE_AFTER_BIND_BIT_EXT descriptors
I have a typical image layout mismatch error "cannot use VkImage with specific layout X that doesn't match the previous known layout Y".
It could be a bug related to delayed descriptors validation due to UPDATE_AFTER_BIND flag and more specifically how CMD_BUFFER_STATE::validate_descriptorsets_in_queuesubmit state is used in this scenario. I'm not 100% sure though. Please check the pseudocode for use case that causes the issue, the comments explain what happens:
create DescriptorSet with VK_DESCRIPTOR_BINDING_UPDATE_AFTER_BIND_BIT_EXT image binding
create Image/ImageView
vkUpdateDescriptorSets(DescriptorSet, ImageView+layoutX)
vkCmdPipelineBarrier(cb, transition Image to layoutX)
vkCmdBindDescriptorsSes(cb, 0, DescriptorSet)
vkCmdBindPipeline(cb, Pipeline)
//
// here validation layer will insert DescriptorSet into validate_descriptorsets_in_queuesubmit
//
vkCmdDispatch(cb, 1,1,1)
vkCmdPipelineBarrier(cb, transition Image to layoutY)
//
// CoreChecks::ValidateCommandBuffersForSubmit will go over descriptor sets in
// validate_descriptorsets_in_queuesubmit and will check if current image layout (layoutY)
// matches layout specified in the descriptor (layoutX) which is not the case and the error
// is reported, even though image layout state was correct during dispatch.
//
vkQueueSubmit(cb)
Hello,
Are there any updates on this issue? In our engine it spams each frame and the solution with disabling image layout transition validation completely is not ideal.
I prepared small program that shows the issues. I tested it using the latest validation layers code from Jan 18 2021.
Here's the source code: https://github.com/kennyalive/layout-mismatch-false-positive
It's windows only, and can be built with Visual Studio 2019. All dependencies are included, just F7 to build.
The program is based on simple vulkan application originally validation error free. If you check commit history you can see that in the last commit with the message "Add code that reproduces bug in validation" I marked descriptor binding with flag VK_DESCRIPTOR_BINDING_UPDATE_AFTER_BIND_BIT (and also added necessary flags to descriptor pool and set layout).
Except addition of this flag the program is not changed in any other way, for example, I don't try to update descriptor after bind (effectively VK_DESCRIPTOR_BINDING_UPDATE_AFTER_BIND_BIT functionality is not used) .
I checked corresponding part in the validation layer source code. The logic there is if descriptor has flag VK_DESCRIPTOR_BINDING_UPDATE_AFTER_BIND_BIT then we don't check draw state immediately but postpone validation till submission. The problem that between specific Draw/Dispatch point and Submit point the image layout state can be changed multiple time and the layout specified by image descriptor's layout field does not necessarily matches the layout that image has at submission point. For example, in my program after Dispatch and before Submit I transition image from general layout to presentation layout.
I understand that for descriptors with VK_DESCRIPTOR_BINDING_UPDATE_AFTER_BIND_BIT we can't do validation at Draw/Dispatch point because at that point image layout specified by descriptor is not necessarily final - we can update it later, but it looks like the solution here is store the current image layout at Draw/Dispatch point and then compare it against final descriptor's layout at Submit point.
@jzulauf-lunarg, can you bump this in priority please?
@Tony-LunarG updated this most recently, I can certainly support efforts to address.
@kennyalive -- excellent reproduction case. Validation and queue submit was incorrectly validating the state of the descriptor against the last state of the image layout in command buffer, not the first state. Fix is in progress, but not longer produces the false positive.
It's a significant change, so it's going to take some additional testing before merging.
@kennyalive -- after further analysis, the fix will simply cause a different set of false positives. (even though it fixes yours)
okay, but does it worth to make the change then? I mean, does in general it makes validation layers better or the area of new false positives could be larger than old false positives? My main motivation is to improve validation layers, I'm okay with having fix later for my use case if that will be beneficial for validation layers.
@kennyalive equal in magnitude, opposite in sign. Several tiers of solutions in discussion.
@kennyalive, while not a solution in itself, you can quiet the spam by using the message_id_filter layer option. In the vk_layer_settings.txt file, this would be khronos_validation.message_id_filter (see the vk_layer_settings.txt file for details) or you can use the VK_LAYER_MESSAGE_ID_FILTER environment variable which takes a semicolon-separated list on Windows and a colon-separated list on Linux.
@mark-lunarg, thanks for the suggestion. My current solution is to use vk_layer_settings with khronos_validation.disables = VALIDATION_CHECK_DISABLE_IMAGE_LAYOUT_VALIDATION but it sounds like your solution provides better granularity.
@kennyalive -- the above PR disables the image layout checks at queue submit time that were generating the false positives, but only narrowly, all other queue submit checks for update-after-bind are retained.
A broader fix to reenable those checks is planned.