116 add shared memory support for l0 path
Facing issue with the RuntimeFunctions.bc file for L0:
$ ./Tests/GpuSharedMemoryTestIntel [==========] Running 1 test from 1 test suite. [----------] Global test environment set-up. [----------] 1 test from SingleColumn [ RUN ] SingleColumn.VariableEntries_CountQuery_4B_Group warning: Linking two modules of different target triples: '/localdisk/lmontign/hdk/omniscidb/build/QueryEngine/RuntimeFunctions.bc' is 'nvptx64-nvidia-cuda' whereas '/localdisk/lmontign/hdk/omniscidb/build/QueryEngine/RuntimeFunctions.bc' is 'spir-unknown-unknown' InvalidTargetTriple: Expects spir-unknown-unknown or spir64-unknown-unknown. Actual target triple is nvptx64-nvidia-cuda
Solved previous .bc mismatch.
Now addr space casting issue:

Casting issue still going on: %4 = addrspacecast i64* %3 to i64 addrspace(3)*
Fail to generate spri-v here:
std::unique_ptr<L0DeviceCompilationContext> gpu_context(compile_and_link_gpu_code(
module_str, module_, l0_mgr_, getWrapperKernel()->getName().str()))
auto success = writeSpirv(module, opts, ss, err)
Unclear where the casting is generated in the application.
Not related to CreatePointerCast, need to double check GpuSharedMemoryUtils.cpp
Casting issue for shared memory is happening here
with address_space = 3
auto ptr_type = [&context](const size_t slot_bytes, const hdk::ir::Type* type) {
if (slot_bytes == sizeof(int32_t)) {
return llvm::Type::getInt32PtrTy(context, /*address_space=*/3);
} else {
CHECK(slot_bytes == sizeof(int64_t));
return llvm::Type::getInt64PtrTy(context, /*address_space=*/3);
}
UNREACHABLE() << "Invalid slot size encountered: " << std::to_string(slot_bytes);
return llvm::Type::getInt32PtrTy(context, /*address_space=*/3);
};
const auto casted_dest_slot_address = ir_builder.CreatePointerCast(
ir_builder.CreateGEP(
dest_byte_stream->getType()->getScalarType()->getPointerElementType(),
dest_byte_stream,
byte_offset),
ptr_type(slot_bytes, type),
"dest_slot_adr_" + std::to_string(slot_idx));
return casted_dest_slot_address;
}