hdk icon indicating copy to clipboard operation
hdk copied to clipboard

Add shared memory support for L0 path

Open kurapov-peter opened this issue 3 years ago • 16 comments

We need to add shared memory support for Intel GPU using L0 backend.

  1. Create an equivalent of the Nvidia GpuSharedMemoryTest.cpp for Intel GPU via L0
  2. Activate SM on Backend / Execution Engine.
  3. Check hardcoded parameters for each GPU architecture

kurapov-peter avatar Nov 18 '22 14:11 kurapov-peter

Created branch: 116-add-shared-memory-support-for-l0-path

lmontigny avatar Apr 13 '23 13:04 lmontigny

Moved to pakurapo/smem-2 branch, need to change the is_l0 flag to true in several place to avoid CPU fallback (and use GPU via L0). Otherwise reported address space casting issue

Clarify strange address space casting:
%dest_slot_adr_0 = addrspacecast i8 addrspace(3)* %2 to i32 addrspace(21875)*

lmontigny avatar May 02 '23 15:05 lmontigny

Large address space issue related to the traits object? solved using

    if (slot_bytes == sizeof(int32_t)) {
      // return traits.smemPointerType(llvm::Type::getInt32Ty(context));
      return llvm::Type::getInt32PtrTy(context, /*address_space=*/3)

generate: %dest_slot_adr_0 = bitcast i8 addrspace(3)* %2 to i32 addrspace(3)*

Current issue:

/GpuSharedMemoryTestIntel.INFO
 2023-05-03T04:39:46.972076 I 1772468 0 0 L0Mgr.cpp:75 Discovered 1 driver(s) for L0 platform.
 2023-05-03T04:39:47.003210 I 1772468 0 0 L0Mgr.cpp:221 L0 module build log: error: IGC SPIRV Consumer does    not support the following opcode: 65534
 ^@

lmontigny avatar May 03 '23 09:05 lmontigny

L0Mgr issue, it seems to be an invalid opcode as 65535 is not reserved by any vendor: https://github.com/KhronosGroup/SPIRV-Headers/blob/main/include/spirv/spir-v.xml#L152

<ids type="opcode" start="6656" end="65535" comment="Opcode range reservable for future use by vendors"/>

Wrong data is read as an opcode earlier in the IGC SPIR-V process

lmontigny avatar May 08 '23 12:05 lmontigny

Searching the invalid opcode, see spir-v disassembly

lmontigny avatar May 09 '23 15:05 lmontigny

Invalid casting in the write_projection_int64 from RuntimeFunctions.cpp

/usr/bin/llvm-as-14: after.linking.before.insert_declaration.spirv.ll:1712:16: error: invalid cast opcode for cast from 'i8 addrspace(4)*' to 'i64*'
  %6 = bitcast i8 addrspace(4)* %0 to i64*, !ResultSetReductionCodegen.cpp !10

Need to create a custom method in genx.ll with the appropriate casting to addrspace(4)

lmontigny avatar May 10 '23 13:05 lmontigny

Added write_projection_int64 in genx.ll with the casting to addrspacecast(4) %6 = bitcast i8 addrspace(4)* %0 to i64 addrspace(4)*

write_projection_int32 is required in genx.ll as well.

Now same casting issue is appearing for agg_sum_double_skip_val, this function needs to be added in genx.ll as well. (with the casting addrspace(4)).

lmontigny avatar May 11 '23 12:05 lmontigny

Fixed several function and added implementation in genx.ll

Need to fix this one:

/usr/bin/llvm-as-14: after.linking.spirv.ll:1978:13: error: '@reduce_from_smem_to_gmem' defined with type 'void (i64 addrspace(4)*, i64 addrspace(4)*, i32)*' but expected 'void (i64 addrspace(4)*, i64 addrspace(3)*, i32)*'
  call void @reduce_from_smem_to_gmem(i64 addrspace(4)* %output_buffer_ptr, i64 addrspace(3)* %smem_input_buffer_ptr, i32 14952)

lmontigny avatar May 11 '23 16:05 lmontigny

Now on the shared_memory related issue:

$ /usr/bin/llvm-as-14 after.linking.spirv.ll
/usr/bin/llvm-as-14: after.linking.spirv.ll:1258:45: error: use of undefined value '@slm.buf.i64'
  %res = bitcast [1024 x i64] addrspace(3)* @slm.buf.i64 to i64 addrspace(3)*

Current genx.ll file:

; TODO(Petr): this should be dynamically sized depending on the arch, but at least 64KB
@slm.buf.i64 = internal local_unnamed_addr addrspace(3) global [1024 x i64] zeroinitializer, align 8

; https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpControlBarrier
declare void @__spirv_ControlBarrier(i32 %execution_scope, i32 %memory_scope, i32 %memory_semantics)

; L0 zeroes out the memory by default, so an empty implementation can work in simple cases
define i64 addrspace(3)* @init_shared_mem(i64 addrspace(4)* %agg_init_val, i32 noundef %groups_buffer_size) {
    %res = bitcast [1024 x i64] addrspace(3)* @slm.buf.i64 to i64 addrspace(3)*
    ret i64 addrspace(3)* null

lmontigny avatar May 12 '23 09:05 lmontigny

Need to copy the global variables from genx.ll to the IR Done following https://github.com/intel-ai/hdk/compare/main...pakurapo/slm-copying-example

lmontigny avatar May 16 '23 16:05 lmontigny

Initializer is not copied, need to add it separately

$ /usr/bin/llvm-as-14 after.linking.spirv.ll
/usr/bin/llvm-as-14: after.linking.spirv.ll:25:76: error: expected value token
@slm.buf.i64 = internal local_unnamed_addr addrspace(3) global [1024 x i64], align 8

Missing zeroinitializer at the end

lmontigny avatar May 17 '23 11:05 lmontigny

Solved init issue using

 // Initialize shared memory buffer
  const auto slm_buffer = module_->getNamedGlobal("slm.buf.i64");
  CHECK(slm_buffer);
  llvm::ArrayType* ArrayTy_0 =
      llvm::ArrayType::get(llvm::IntegerType::get(module_->getContext(), 64), 1024);
  llvm::ConstantAggregateZero* const_array_2 =
      llvm::ConstantAggregateZero::get(ArrayTy_0);
  slm_buffer->setInitializer(const_array_2)

lmontigny avatar May 23 '23 16:05 lmontigny

By running the test ./Tests/GpuSharedMemoryTestIntel:

hdk_log/./Tests/GpuSharedMemoryTestIntel =

 2023-05-24T05:52:15.323533 I 21622 0 0 L0Mgr.cpp:75 Discovered 1 driver(s) for L0 platform.
 2023-05-24T05:52:15.366145 I 21622 0 0 L0Mgr.cpp:221 L0 module build log: error: IGC SPIRV Consumer does not   support the following opcode: 65534

Need to rebase and/or clean the current branch with the Intel GPU tests

lmontigny avatar May 24 '23 10:05 lmontigny

Current state [2023-05-25]:

  • Rebased with origin/main bc154e46cafcfbd73892bd96cbb5079df8319690
  • Now using lmontigny/smem-3 branch
  • See draft PR https://github.com/intel-ai/hdk/pull/507
  • genx.ll is clean, with the required custom methods using the correct addrspace(4) casting.
  • No casting issue with /usr/bin/llvm-as-14 after.linking.spirv.l anymore
  • Compilation HDK with Intel GPU Test ok
  • At runtime, ./Tests/GpuSharedMemoryTestIntel gives:
$ ./Tests/GpuSharedMemoryTestIntel
[==========] Running 7 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 7 tests from SingleColumn
[ RUN      ] SingleColumn.VariableEntries_CountQuery_4B_Group
Adding global slm.buf.i64
unknown file: Failure
C++ exception with description "L0 error: error occurred when building module, see build log for details" thrown in the test body.
[  FAILED  ] SingleColumn.VariableEntries_CountQuery_4B_Group (90 ms)
...
[----------] Global test environment tear-down
[==========] 7 tests from 1 test suite ran. (481 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 7 tests, listed below:
[  FAILED  ] SingleColumn.VariableEntries_CountQuery_4B_Group
[  FAILED  ] SingleColumn.VariableEntries_CountQuery_8B_Group
[  FAILED  ] SingleColumn.VariableSteps_FixedEntries_1
[  FAILED  ] SingleColumn.VariableSteps_FixedEntries_2
[  FAILED  ] SingleColumn.VariableSteps_FixedEntries_3
[  FAILED  ] SingleColumn.VariableSteps_FixedEntries_4
[  FAILED  ] SingleColumn.VariableNumBuffers

 7 FAILED TESTS
  • L0 error from hdk_log shows:
-  2023-05-25T10:11:17.637143 I 565758 0 0 L0Mgr.cpp:75            Discovered 1 driver(s) for L0 platform.
 2023-05-25T10:11:17.690471 I 565758 0 0 L0Mgr.cpp:221 L0        module build log: error: IGC SPIRV Consumer does not support    the following opcode: 65534
  • Unclear where the unsupported opcode comes from..

lmontigny avatar May 25 '23 15:05 lmontigny

Internal recommendations using IR

A couple of suggestions how to proceed here

I don't think that looking into LLVM IR would help here, better look at SPIR-V I've just tried to do IR -> SPIR-V -> IR roundtrip with the version of SPIR-V translator we use at intel/llvm - this should be more or less fresh KhronosGroup/SPIRV-LLVM-Translator version I've encountered no problems with that, which suggests that SPIR-V consumer you are using probably has some bug comparing to the upstream SPIR-V translator

  • as I can see, SPIR-V produced from this IR doesn't use any extensions and therefore you can try use spirv-val from https://github.com/KhronosGroup/SPIRV-Tools to see if SPIR-V itself is valid. Perhaps it is generated incorrectly in the first place. I mentioned extensions here, because we have a known issue with a lot of Intel SPIR-V extensions not being known to SPIRV-Tools repo
  • you can try insert some debug prints to the SPIR-V consumer so it prints every instruction it has parsed. That way you will at least localize which particular instruction confuses the translator and what are the instructions which precede it. I guess you can also use a debugger for that, if that's easier for you
  • you can try https://llvm.org/docs/Bugpoint.html to get minimized LLVM IR which reproduces the problem
  • you can also try contact someone from IGC - they may know their SPIR-V consumer better and therefore suggest something else as well. XXX could be a good candidate here

Work in progress to debug our SPIR-V

lmontigny avatar Jun 09 '23 14:06 lmontigny

The support is merged. Will keep the issue open until we have enabled the tests in CI.

kurapov-peter avatar Aug 22 '23 09:08 kurapov-peter