get_local_size inside kernel with fixed work group size forces spec. constants way for wgSize
Input:
__attribute__((reqd_work_group_size(1, 1, 1)))
__kernel void kernel1D_Test1(__global uint* restrict a_data)
{
a_data[get_local_id(0)] = 3;
}
__attribute__((reqd_work_group_size(32, 8, 1)))
__kernel void kernel2D_Test2(__global uint* restrict a_data)
{
const uint localId = get_local_id(0) + get_local_size(0)*get_local_id(1);
if(localId == 0)
a_data[localId]= 2;
}
Output
%uint_1 = OpConstant %uint 1
%10 = OpSpecConstant %uint 1
%11 = OpSpecConstant %uint 1
%12 = OpSpecConstant %uint 1
%gl_WorkGroupSize = OpSpecConstantComposite %v3uint %10 %11 %12`
if inline get_local_size(0) and replace it with 32, got correct result
OpExecutionMode %11 LocalSize 1 1 1
OpExecutionMode %20 LocalSize 32 8 1
As you note this is a side effect of how get_local_size is implemented. Internally clspv has a variable to load the size from and that variable can only get a single initializer, but there are multiple kernels with different required workgroup sizes. If the sizes were all the same it would work as you expect. There is room for improvement, but I would consider it a lower priority.
Thank you very much! Perfectly OK! This problem does not bother me now. I can just use the known wgSize instead of get_local_size(...) call.
But at some point, I just could not understand for a long time what happened. Size silently became 1 1 1 by default and my Vulkan code didn't work correctly.
At some point, I guessed to look at SPIR-V and understood the problem. Probably warning and/or some comments in documentation will help other developers to avoid this problem.
Btw, same issue if enable C++ in kernels with
-cl-std=CLC++ -inline-entry-points