Add shared memory support for L0 path
We need to add shared memory support for Intel GPU using L0 backend.
- Create an equivalent of the Nvidia GpuSharedMemoryTest.cpp for Intel GPU via L0
- Activate SM on Backend / Execution Engine.
- Check hardcoded parameters for each GPU architecture
Created branch: 116-add-shared-memory-support-for-l0-path
Moved to pakurapo/smem-2 branch, need to change the is_l0 flag to true in several place to avoid CPU fallback (and use GPU via L0). Otherwise reported address space casting issue
Clarify strange address space casting:
%dest_slot_adr_0 = addrspacecast i8 addrspace(3)* %2 to i32 addrspace(21875)*
Large address space issue related to the traits object? solved using
if (slot_bytes == sizeof(int32_t)) {
// return traits.smemPointerType(llvm::Type::getInt32Ty(context));
return llvm::Type::getInt32PtrTy(context, /*address_space=*/3)
generate:
%dest_slot_adr_0 = bitcast i8 addrspace(3)* %2 to i32 addrspace(3)*
Current issue:
/GpuSharedMemoryTestIntel.INFO
2023-05-03T04:39:46.972076 I 1772468 0 0 L0Mgr.cpp:75 Discovered 1 driver(s) for L0 platform.
2023-05-03T04:39:47.003210 I 1772468 0 0 L0Mgr.cpp:221 L0 module build log: error: IGC SPIRV Consumer does not support the following opcode: 65534
^@
L0Mgr issue, it seems to be an invalid opcode as 65535 is not reserved by any vendor: https://github.com/KhronosGroup/SPIRV-Headers/blob/main/include/spirv/spir-v.xml#L152
<ids type="opcode" start="6656" end="65535" comment="Opcode range reservable for future use by vendors"/>
Wrong data is read as an opcode earlier in the IGC SPIR-V process
Searching the invalid opcode, see spir-v disassembly
Invalid casting in the write_projection_int64 from RuntimeFunctions.cpp
/usr/bin/llvm-as-14: after.linking.before.insert_declaration.spirv.ll:1712:16: error: invalid cast opcode for cast from 'i8 addrspace(4)*' to 'i64*'
%6 = bitcast i8 addrspace(4)* %0 to i64*, !ResultSetReductionCodegen.cpp !10
Need to create a custom method in genx.ll with the appropriate casting to addrspace(4)
Added write_projection_int64 in genx.ll with the casting to addrspacecast(4)
%6 = bitcast i8 addrspace(4)* %0 to i64 addrspace(4)*
write_projection_int32 is required in genx.ll as well.
Now same casting issue is appearing for agg_sum_double_skip_val, this function needs to be added in genx.ll as well. (with the casting addrspace(4)).
Fixed several function and added implementation in genx.ll
Need to fix this one:
/usr/bin/llvm-as-14: after.linking.spirv.ll:1978:13: error: '@reduce_from_smem_to_gmem' defined with type 'void (i64 addrspace(4)*, i64 addrspace(4)*, i32)*' but expected 'void (i64 addrspace(4)*, i64 addrspace(3)*, i32)*'
call void @reduce_from_smem_to_gmem(i64 addrspace(4)* %output_buffer_ptr, i64 addrspace(3)* %smem_input_buffer_ptr, i32 14952)
Now on the shared_memory related issue:
$ /usr/bin/llvm-as-14 after.linking.spirv.ll
/usr/bin/llvm-as-14: after.linking.spirv.ll:1258:45: error: use of undefined value '@slm.buf.i64'
%res = bitcast [1024 x i64] addrspace(3)* @slm.buf.i64 to i64 addrspace(3)*
Current genx.ll file:
; TODO(Petr): this should be dynamically sized depending on the arch, but at least 64KB
@slm.buf.i64 = internal local_unnamed_addr addrspace(3) global [1024 x i64] zeroinitializer, align 8
; https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpControlBarrier
declare void @__spirv_ControlBarrier(i32 %execution_scope, i32 %memory_scope, i32 %memory_semantics)
; L0 zeroes out the memory by default, so an empty implementation can work in simple cases
define i64 addrspace(3)* @init_shared_mem(i64 addrspace(4)* %agg_init_val, i32 noundef %groups_buffer_size) {
%res = bitcast [1024 x i64] addrspace(3)* @slm.buf.i64 to i64 addrspace(3)*
ret i64 addrspace(3)* null
Need to copy the global variables from genx.ll to the IR Done following https://github.com/intel-ai/hdk/compare/main...pakurapo/slm-copying-example
Initializer is not copied, need to add it separately
$ /usr/bin/llvm-as-14 after.linking.spirv.ll
/usr/bin/llvm-as-14: after.linking.spirv.ll:25:76: error: expected value token
@slm.buf.i64 = internal local_unnamed_addr addrspace(3) global [1024 x i64], align 8
Missing zeroinitializer at the end
Solved init issue using
// Initialize shared memory buffer
const auto slm_buffer = module_->getNamedGlobal("slm.buf.i64");
CHECK(slm_buffer);
llvm::ArrayType* ArrayTy_0 =
llvm::ArrayType::get(llvm::IntegerType::get(module_->getContext(), 64), 1024);
llvm::ConstantAggregateZero* const_array_2 =
llvm::ConstantAggregateZero::get(ArrayTy_0);
slm_buffer->setInitializer(const_array_2)
By running the test ./Tests/GpuSharedMemoryTestIntel:
hdk_log/./Tests/GpuSharedMemoryTestIntel =
2023-05-24T05:52:15.323533 I 21622 0 0 L0Mgr.cpp:75 Discovered 1 driver(s) for L0 platform.
2023-05-24T05:52:15.366145 I 21622 0 0 L0Mgr.cpp:221 L0 module build log: error: IGC SPIRV Consumer does not support the following opcode: 65534
Need to rebase and/or clean the current branch with the Intel GPU tests
Current state [2023-05-25]:
- Rebased with origin/main
bc154e46cafcfbd73892bd96cbb5079df8319690 - Now using
lmontigny/smem-3branch - See draft PR https://github.com/intel-ai/hdk/pull/507
- genx.ll is clean, with the required custom methods using the correct addrspace(4) casting.
- No casting issue with
/usr/bin/llvm-as-14 after.linking.spirv.lanymore - Compilation HDK with Intel GPU Test ok
- At runtime,
./Tests/GpuSharedMemoryTestIntelgives:
$ ./Tests/GpuSharedMemoryTestIntel
[==========] Running 7 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 7 tests from SingleColumn
[ RUN ] SingleColumn.VariableEntries_CountQuery_4B_Group
Adding global slm.buf.i64
unknown file: Failure
C++ exception with description "L0 error: error occurred when building module, see build log for details" thrown in the test body.
[ FAILED ] SingleColumn.VariableEntries_CountQuery_4B_Group (90 ms)
...
[----------] Global test environment tear-down
[==========] 7 tests from 1 test suite ran. (481 ms total)
[ PASSED ] 0 tests.
[ FAILED ] 7 tests, listed below:
[ FAILED ] SingleColumn.VariableEntries_CountQuery_4B_Group
[ FAILED ] SingleColumn.VariableEntries_CountQuery_8B_Group
[ FAILED ] SingleColumn.VariableSteps_FixedEntries_1
[ FAILED ] SingleColumn.VariableSteps_FixedEntries_2
[ FAILED ] SingleColumn.VariableSteps_FixedEntries_3
[ FAILED ] SingleColumn.VariableSteps_FixedEntries_4
[ FAILED ] SingleColumn.VariableNumBuffers
7 FAILED TESTS
- L0 error from
hdk_logshows:
- 2023-05-25T10:11:17.637143 I 565758 0 0 L0Mgr.cpp:75 Discovered 1 driver(s) for L0 platform.
2023-05-25T10:11:17.690471 I 565758 0 0 L0Mgr.cpp:221 L0 module build log: error: IGC SPIRV Consumer does not support the following opcode: 65534
- Unclear where the unsupported opcode comes from..
Internal recommendations using IR
A couple of suggestions how to proceed here
I don't think that looking into LLVM IR would help here, better look at SPIR-V I've just tried to do IR -> SPIR-V -> IR roundtrip with the version of SPIR-V translator we use at intel/llvm - this should be more or less fresh KhronosGroup/SPIRV-LLVM-Translator version I've encountered no problems with that, which suggests that SPIR-V consumer you are using probably has some bug comparing to the upstream SPIR-V translator
- as I can see, SPIR-V produced from this IR doesn't use any extensions and therefore you can try use spirv-val from https://github.com/KhronosGroup/SPIRV-Tools to see if SPIR-V itself is valid. Perhaps it is generated incorrectly in the first place. I mentioned extensions here, because we have a known issue with a lot of Intel SPIR-V extensions not being known to SPIRV-Tools repo
- you can try insert some debug prints to the SPIR-V consumer so it prints every instruction it has parsed. That way you will at least localize which particular instruction confuses the translator and what are the instructions which precede it. I guess you can also use a debugger for that, if that's easier for you
- you can try https://llvm.org/docs/Bugpoint.html to get minimized LLVM IR which reproduces the problem
- you can also try contact someone from IGC - they may know their SPIR-V consumer better and therefore suggest something else as well. XXX could be a good candidate here
Work in progress to debug our SPIR-V
The support is merged. Will keep the issue open until we have enabled the tests in CI.