level-zero icon indicating copy to clipboard operation
level-zero copied to clipboard

Parameter validation layer erroneously checks pCommandQueueGroupProperties

Open etomzak opened this issue 2 years ago • 3 comments

I'm seeing some unexpected behavior with the parameter validation layer enabled.

Expected behavior:

Calling zeDeviceGetCommandQueueGroupProperties() populates the memory pointed to by the pCommandQueueGroupProperties parameter.

Actual behavior:

Calling zeDeviceGetCommandQueueGroupProperties() returns ZE_RESULT_ERROR_INVALID_ARGUMENT.

Comments

The issue is that the validation layer expects the ze_command_queue_group_properties_t::stype field to be set by the application. However, it is the responsibility of the zeDeviceGetCommandQueueGroupProperties() function to initialize the data. The work-around is for the application to partially initialize the data by setting all ze_command_queue_group_properties_t::stype fields before calling zeDeviceGetCommandQueueGroupProperties().

I think this is the offending line, and I suspect the solution is just to remove it.

etomzak avatar Jul 17 '23 12:07 etomzak

thanks @etomzak . It is true that zeDeviceGetCommandQueueGroupProperties populates the data, but only for [out] args. [In] arguments, as stype, are only inputs, and must be set by the application.

ze_structure_type_t stype [in] type of this structure

ze_structure_type_t stype [in] type of this structure

void *pNext [in,out][optional] must be null or a pointer to an extension-specific structure (i.e. contains stype and pNext).

ze_command_queue_group_property_flags_t flags [out] 0 (none) or a valid combination of ze_command_queue_group_property_flag_t

size_t maxMemoryFillPatternSize [out] maximum pattern_size supported by command queue group. See zeCommandListAppendMemoryFill for more details.

uint32_t numQueues [out] the number of physical engines within the group.

jandres742 avatar Jul 19 '23 06:07 jandres742

That is a ... surprising API design. I'm not convinced it makes sense. Intuitively, even if ze_command_queue_group_properties_t::stype is labeled as [in], the data flow direction in an API query is from the runtime to the application, so one would expect the runtime to put the information "in" to the struct so that the application can then read it out.

More broadly, the concept of [in]/[out] is only relevant to descriptor structs (structs that are used to bundle function arguments together). It's strange to even have them on the fields of something like ze_command_queue_group_properties_t.

I looked up Vulkan to see what they do in this situation. A couple of examples are VkDescriptorSetLayoutBinding and VkQueueFamilyProperties. Interestingly, these structs don't have an sType field at all, assumedly because Vulkan only uses sType for *Info-style structs (parameter bundling).

etomzak avatar Jul 21 '23 17:07 etomzak

thanks @etomzak . The use of stype as in as you mention, where in means that users set that value, follows the convention of other APIs, like Vulkan, please see here https://registry.khronos.org/vulkan/site/guide/latest/pnext_and_stype.html#:~:text=The%20void*%20pNext%20is%20used,extensions%20that%20expose%20new%20structures.

In L0, stype is use for properties and other structs.

Having stype as in is needed so the user can tell the target driver which structure the user is passing. W/o this , the driver would have to guess which struct the application is trying to use, which might be not efficient or easy to implement.

jandres742 avatar Aug 07 '23 19:08 jandres742