[cmake] Build parameter consistent issue with composable_kernelConfig.cmake

Open junliume opened this issue 1 year ago • 5 comments

[Observations]: when building CK for specific platforms such as gfx1100:

sudo CXX=/opt/rocm/bin/amdclang++ cmake -DCMAKE_PREFIX_PATH=/opt/rocm -DCMAKE_BUILD_TYPE=Release -DGPU_ARCHS="gfx1100" ..

naturally /opt/rocm/lib/cmake/composable_kernel/composable_kerneldevice_mha_operationsTargets.cmak is *NOT generated because the platform does not yet fully support MHA operations.

However, when we try to build other libraries which depends on CK, e.g. MIOpen, we will observe the following error:

CMake Error at /opt/rocm/lib/cmake/composable_kernel/composable_kernelConfig.cmake:24 (include):
  include could not find requested file:

    /opt/rocm/lib/cmake/composable_kernel/composable_kerneldevice_mha_operationsTargets.cmake
Call Stack (most recent call first):
  CMakeLists.txt:330 (find_package)

[Reproduce Steps]: listed above

[Analysis]: when building CK for specific platforms such as gfx1100:

sudo CXX=/opt/rocm/bin/amdclang++ cmake -DCMAKE_PREFIX_PATH=/opt/rocm -DCMAKE_BUILD_TYPE=Release -DGPU_ARCHS="gfx1100" ..

this generated file build/composable_kernelConfig.cmake


set(_composable_kernel_supported_components device_other_operations device_gemm_operations device_conv_operations device_mha_operations device_contraction_operations device_reduction_operations utility)

foreach(_comp ${composable_kernel_FIND_COMPONENTS})
        if(NOT _comp IN_LIST _composable_kernel_supported_components)
                set(composable_kernel_FOUND False)
                set(composable_kernel_NOT_FOUND_MESSAGE "Unsupported component: ${_comp}")
        endif()
        include("${CMAKE_CURRENT_LIST_DIR}/composable_kernel${_comp}Targets.cmake")
endforeach()

Apparently, the _composable_kernel_supported_components needs to be adjusted according to build parameters.

Nov 07 '24 19:11 junliume

https://github.com/ROCm/composable_kernel/blob/7965d66a81bc4a8189ba5b4f30641f346be935cf/Config.cmake.in

Nov 07 '24 19:11 junliume

The problem with not only in _composable_kernel_supported_components. Sometimes some generic component is supported, but depending on GPU arch set some functions (namely, xdl functions on offload targets != gfx9xx) are missing in static library.

Given that, I don't see a solution purely on composable_kernel side. So for now I added 2 patches:

composable-kernel-6.3.0-conditional-kernels.patch - add soft skip if component is not found
miopen-6.3.0-conditional-ck-components.patch + group of replacements - rework miopen to compile kernel implementations based on MIOPEN_USE_CK_MHA_OPS and MIOPEN_USE_CK_XDL_OPS defines, mostly based on filenames.

Any solutions on your side are welcome, or feel free to apply these patches.

Dec 23 '24 16:12 AngryLoki

Same on ROCm 6.4.1. I faced that exact error when compiling for any subset of kernels the hardware supports. I am targeting the gfx1010 and gfx1012 ISAs. After applying @AngryLoki's patches, the MIOpen build with composable kernels resulted in a linker error, most of which have XDL operations (appears incompatible with RDNA cards). It seems I need the complete set that MIOpen expects, which will likely take a few days to run the entire composable kernel build process.

CMake Deprecation Warning at /opt/rocm/share/rocm/cmake/ROCMConfig.cmake:12 (message):
  Use of find_package(ROCM) is deprecated as of ROCm 6.4.  Please use
  find_package(ROCmCMakeBuildTools)
Call Stack (most recent call first):
  CMakeLists.txt:83 (find_package)


-- Found nlohmann_json: /usr/share/cmake/nlohmann_json/nlohmann_jsonConfig.cmake (found suitable version "3.11.3", minimum required is "3.9.1") 
-- MIOPEN_WORKAROUND_USE_BOOST_FILESYSTEM OFF
-- MIOPEN_OFFLINE_COMPILER_PATHS_V2: OFF
-- MIOPEN_BUILD_DRIVER: ON
-- Build with HIP 6.4.43483 /opt/rocm/lib/cmake/hip
-- Enable Composable Kernels: ON
-- Hip compiler flags:  -x hip    -D__HIP_PLATFORM_AMD__=1  -DUSE_PROF_API=1 --hip-link    -fno-offload-uniform-block 
CMake Warning at CMakeLists.txt:338 (message):
  CK component 'device_mha_operations' not found.


-- OpenCL compiler: /opt/rocm/llvm/bin/amdclang
-- Build with rocblas 4.4.0 /opt/rocm/lib/cmake/rocblas
-- Build without hipbBLASLt
-- HIP backend selected.
-- clang-offload-bundler found: /opt/rocm/llvm/bin/clang-offload-bundler
-- Build with rocMLIR::rockCompiler 2.0.0 /opt/rocm/lib/cmake/rocmlir
-- AMDGCN assembler: /opt/rocm/llvm/bin/clang
-- Build with amd_comgr 3.0.0 /opt/rocm/lib/cmake/amd_comgr
CMake Deprecation Warning at /opt/rocm/lib/cmake/hiprtc/hiprtc-config.cmake:21 (cmake_minimum_required):
  Compatibility with CMake < 3.5 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.
Call Stack (most recent call first):
  CMakeLists.txt:487 (find_package)


-- Build with hiprtc 6.4.43483 /opt/rocm/lib/cmake/hiprtc
-- HALF_INCLUDE_DIR: /opt/rocm/include
-- Found nlohmann_json: /usr/share/cmake/nlohmann_json/nlohmann_jsonConfig.cmake (found version "3.11.3") 
-- Build with frugally-deep 0.15.20 /usr/lib/cmake/frugally-deep
-- Build with Eigen3 3.4.0 /usr/share/eigen3/cmake
-- Build with rocTracer: /opt/rocm/lib/libroctx64.so
-- Build with rocrand 3.3.0 /opt/rocm/lib/cmake/rocrand
CMake Warning at CMakeLists.txt:696 (message):
  GIT LFS files not pulled down, skipped: gfx942.kdb, gfx90a.kdb,
  gfx1030.kdb, gfx908.kdb, gfx906.kdb, gfx900.kdb


-- rocm-cmake: Set license file to /root/ROCm/MIOpen/LICENSE.txt.
-- Clang tidy found: 19.0.0git
-- MIOpen_VERSION= 3.4.0
-- CMAKE_BUILD_TYPE= Release
-- Librt: /usr/lib/x86_64-linux-gnu/librt.a
-- Configuring done (2.4s)
-- Generating done (0.2s)
-- Build files have been written to: /root/rocm-build/miopen
[0/2] Re-checking globbed directories...
[620/620] Linking CXX executable bin/MIOpenDriver
FAILED: bin/MIOpenDriver 
: && /opt/rocm/lib/llvm/bin/amdclang++ -march=native -O3 -s -w -Wno-error=unused-command-line-argument -O3 -DNDEBUG -s -pthread driver/CMakeFiles/MIOpenDriver.dir/InputFlags.cpp.o driver/CMakeFiles/MIOpenDriver.dir/conv_common.cpp.o driver/CMakeFiles/MIOpenDriver.dir/dm_activ.cpp.o driver/CMakeFiles/MIOpenDriver.dir/dm_adam.cpp.o driver/CMakeFiles/MIOpenDriver.dir/dm_addlayernorm.cpp.o driver/CMakeFiles/MIOpenDriver.dir/dm_bnorm.cpp.o driver/CMakeFiles/MIOpenDriver.dir/dm_cat.cpp.o driver/CMakeFiles/MIOpenDriver.dir/dm_conv.cpp.o driver/CMakeFiles/MIOpenDriver.dir/dm_convbfp16.cpp.o driver/CMakeFiles/MIOpenDriver.dir/dm_convbfp8.cpp.o driver/CMakeFiles/MIOpenDriver.dir/dm_convfp16.cpp.o driver/CMakeFiles/MIOpenDriver.dir/dm_convfp8.cpp.o driver/CMakeFiles/MIOpenDriver.dir/dm_convint8.cpp.o driver/CMakeFiles/MIOpenDriver.dir/dm_dropout.cpp.o driver/CMakeFiles/MIOpenDriver.dir/dm_fusion.cpp.o driver/CMakeFiles/MIOpenDriver.dir/dm_gemm.cpp.o driver/CMakeFiles/MIOpenDriver.dir/dm_getitem.cpp.o driver/CMakeFiles/MIOpenDriver.dir/dm_glu.cpp.o driver/CMakeFiles/MIOpenDriver.dir/dm_groupnorm.cpp.o driver/CMakeFiles/MIOpenDriver.dir/dm_kthvalue.cpp.o driver/CMakeFiles/MIOpenDriver.dir/dm_layernorm.cpp.o driver/CMakeFiles/MIOpenDriver.dir/dm_lrn.cpp.o driver/CMakeFiles/MIOpenDriver.dir/dm_multimarginloss.cpp.o driver/CMakeFiles/MIOpenDriver.dir/dm_pool.cpp.o driver/CMakeFiles/MIOpenDriver.dir/dm_prelu.cpp.o driver/CMakeFiles/MIOpenDriver.dir/dm_reduce.cpp.o driver/CMakeFiles/MIOpenDriver.dir/dm_reduceextreme.cpp.o driver/CMakeFiles/MIOpenDriver.dir/dm_reducecalculation.cpp.o driver/CMakeFiles/MIOpenDriver.dir/dm_rnn.cpp.o driver/CMakeFiles/MIOpenDriver.dir/dm_rope.cpp.o driver/CMakeFiles/MIOpenDriver.dir/dm_softmarginloss.cpp.o driver/CMakeFiles/MIOpenDriver.dir/dm_softmax.cpp.o driver/CMakeFiles/MIOpenDriver.dir/dm_t5layernorm.cpp.o driver/CMakeFiles/MIOpenDriver.dir/dm_tensorop.cpp.o driver/CMakeFiles/MIOpenDriver.dir/dm_transformers_adam_w.cpp.o driver/CMakeFiles/MIOpenDriver.dir/main.cpp.o driver/CMakeFiles/MIOpenDriver.dir/registry_driver_maker.cpp.o driver/CMakeFiles/MIOpenDriver.dir/rocrand_wrapper.cpp.o -o bin/MIOpenDriver  -Wl,-rpath,/root/rocm-build/miopen/lib:  lib/libMIOpen.so.1.0  /opt/rocm/lib/librocrand.so.1.1  --hip-link  --offload-arch=gfx1010  --offload-arch=gfx1012  /opt/rocm/lib/libamd_comgr.so.3.0  /opt/rocm/lib/librocblas.so.4.4  /usr/lib/x86_64-linux-gnu/libboost_filesystem.a  /usr/lib/x86_64-linux-gnu/libboost_atomic.a  /usr/lib/x86_64-linux-gnu/librt.a  /opt/rocm/lib/libamdhip64.so.6.4.43483-a187df25c  /opt/rocm/lib/llvm/lib/clang/19/lib/linux/libclang_rt.builtins-x86_64.a && :
/usr/bin/ld: lib/libMIOpen.so.1.0: undefined reference to `ck::tensor_operation::device::instance::add_device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f32_instances(std::vector<std::unique_ptr<ck::tensor_operation::device::DeviceConvFwd<2, ck::tensor_layout::convolution::NHWC, ck::tensor_layout::convolution::KYXC, ck::tensor_layout::convolution::NHWK, float, float, float, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough>, std::default_delete<ck::tensor_operation::device::DeviceConvFwd<2, ck::tensor_layout::convolution::NHWC, ck::tensor_layout::convolution::KYXC, ck::tensor_layout::convolution::NHWK, float, float, float, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough> > >, std::allocator<std::unique_ptr<ck::tensor_operation::device::DeviceConvFwd<2, ck::tensor_layout::convolution::NHWC, ck::tensor_layout::convolution::KYXC, ck::tensor_layout::convolution::NHWK, float, float, float, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough>, std::default_delete<ck::tensor_operation::device::DeviceConvFwd<2, ck::tensor_layout::convolution::NHWC, ck::tensor_layout::convolution::KYXC, ck::tensor_layout::convolution::NHWK, float, float, float, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough> > > > >&)'
/usr/bin/ld: lib/libMIOpen.so.1.0: undefined reference to `ck::tensor_operation::device::instance::add_device_grouped_conv3d_fwd_xdl_scale_ndhwgc_gkzyxc_ndhwgk_bf16_instances(std::vector<std::unique_ptr<ck::tensor_operation::device::DeviceGroupedConvFwdMultipleABD<3, ck::tensor_layout::convolution::NDHWGC, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<>, ck::tensor_layout::convolution::NDHWGK, unsigned short, unsigned short, ck::Tuple<>, unsigned short, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Scale, unsigned short, unsigned short>, std::default_delete<ck::tensor_operation::device::DeviceGroupedConvFwdMultipleABD<3, ck::tensor_layout::convolution::NDHWGC, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<>, ck::tensor_layout::convolution::NDHWGK, unsigned short, unsigned short, ck::Tuple<>, unsigned short, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Scale, unsigned short, unsigned short> > >, std::allocator<std::unique_ptr<ck::tensor_operation::device::DeviceGroupedConvFwdMultipleABD<3, ck::tensor_layout::convolution::NDHWGC, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<>, ck::tensor_layout::convolution::NDHWGK, unsigned short, unsigned short, ck::Tuple<>, unsigned short, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Scale, unsigned short, unsigned short>, std::default_delete<ck::tensor_operation::device::DeviceGroupedConvFwdMultipleABD<3, ck::tensor_layout::convolution::NDHWGC, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<>, ck::tensor_layout::convolution::NDHWGK, unsigned short, unsigned short, ck::Tuple<>, unsigned short, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Scale, unsigned short, unsigned short> > > > >&)'
/usr/bin/ld: lib/libMIOpen.so.1.0: undefined reference to `ck::tensor_operation::device::instance::add_device_conv2d_bwd_data_xdl_nhwc_kyxc_nhwk_f32_instances(std::vector<std::unique_ptr<ck::tensor_operation::device::DeviceConvBwdData<2, ck::tensor_layout::convolution::NHWC, ck::tensor_layout::convolution::KYXC, ck::tensor_layout::convolution::NHWK, float, float, float, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough>, std::default_delete<ck::tensor_operation::device::DeviceConvBwdData<2, ck::tensor_layout::convolution::NHWC, ck::tensor_layout::convolution::KYXC, ck::tensor_layout::convolution::NHWK, float, float, float, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough> > >, std::allocator<std::unique_ptr<ck::tensor_operation::device::DeviceConvBwdData<2, ck::tensor_layout::convolution::NHWC, ck::tensor_layout::convolution::KYXC, ck::tensor_layout::convolution::NHWK, float, float, float, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough>, std::default_delete<ck::tensor_operation::device::DeviceConvBwdData<2, ck::tensor_layout::convolution::NHWC, ck::tensor_layout::convolution::KYXC, ck::tensor_layout::convolution::NHWK, float, float, float, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough> > > > >&)'
/usr/bin/ld: lib/libMIOpen.so.1.0: undefined reference to `ck::tensor_operation::device::instance::add_device_grouped_conv3d_bwd_data_xdl_scale_ndhwgk_gkzyxc_ndhwgc_f32_instances(std::vector<std::unique_ptr<ck::tensor_operation::device::DeviceGroupedConvBwdDataMultipleD<3, ck::tensor_layout::convolution::NDHWGK, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<>, ck::tensor_layout::convolution::NDHWGC, float, float, ck::Tuple<>, float, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Scale, float, float>, std::default_delete<ck::tensor_operation::device::DeviceGroupedConvBwdDataMultipleD<3, ck::tensor_layout::convolution::NDHWGK, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<>, ck::tensor_layout::convolution::NDHWGC, float, float, ck::Tuple<>, float, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Scale, float, float> > >, std::allocator<std::unique_ptr<ck::tensor_operation::device::DeviceGroupedConvBwdDataMultipleD<3, ck::tensor_layout::convolution::NDHWGK, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<>, ck::tensor_layout::convolution::NDHWGC, float, float, ck::Tuple<>, float, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Scale, float, float>, std::default_delete<ck::tensor_operation::device::DeviceGroupedConvBwdDataMultipleD<3, ck::tensor_layout::convolution::NDHWGK, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<>, ck::tensor_layout::convolution::NDHWGC, float, float, ck::Tuple<>, float, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Scale, float, float> > > > >&)'
/usr/bin/ld: lib/libMIOpen.so.1.0: undefined reference to `ck::tensor_operation::device::instance::add_device_grouped_conv3d_bwd_data_xdl_bilinear_ndhwgk_gkzyxc_ndhwgc_f32_instances(std::vector<std::unique_ptr<ck::tensor_operation::device::DeviceGroupedConvBwdDataMultipleD<3, ck::tensor_layout::convolution::NDHWGK, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<ck::tensor_layout::convolution::NDHWGC>, ck::tensor_layout::convolution::NDHWGC, float, float, ck::Tuple<float>, float, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Bilinear, float, float>, std::default_delete<ck::tensor_operation::device::DeviceGroupedConvBwdDataMultipleD<3, ck::tensor_layout::convolution::NDHWGK, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<ck::tensor_layout::convolution::NDHWGC>, ck::tensor_layout::convolution::NDHWGC, float, float, ck::Tuple<float>, float, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Bilinear, float, float> > >, std::allocator<std::unique_ptr<ck::tensor_operation::device::DeviceGroupedConvBwdDataMultipleD<3, ck::tensor_layout::convolution::NDHWGK, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<ck::tensor_layout::convolution::NDHWGC>, ck::tensor_layout::convolution::NDHWGC, float, float, ck::Tuple<float>, float, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Bilinear, float, float>, std::default_delete<ck::tensor_operation::device::DeviceGroupedConvBwdDataMultipleD<3, ck::tensor_layout::convolution::NDHWGK, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<ck::tensor_layout::convolution::NDHWGC>, ck::tensor_layout::convolution::NDHWGC, float, float, ck::Tuple<float>, float, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Bilinear, float, float> > > > >&)'
/usr/bin/ld: lib/libMIOpen.so.1.0: undefined reference to `ck::tensor_operation::device::instance::add_device_grouped_conv3d_bwd_data_xdl_bilinear_ndhwgk_gkzyxc_ndhwgc_bf16_instances(std::vector<std::unique_ptr<ck::tensor_operation::device::DeviceGroupedConvBwdDataMultipleD<3, ck::tensor_layout::convolution::NDHWGK, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<ck::tensor_layout::convolution::NDHWGC>, ck::tensor_layout::convolution::NDHWGC, unsigned short, unsigned short, ck::Tuple<unsigned short>, unsigned short, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Bilinear, unsigned short, unsigned short>, std::default_delete<ck::tensor_operation::device::DeviceGroupedConvBwdDataMultipleD<3, ck::tensor_layout::convolution::NDHWGK, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<ck::tensor_layout::convolution::NDHWGC>, ck::tensor_layout::convolution::NDHWGC, unsigned short, unsigned short, ck::Tuple<unsigned short>, unsigned short, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Bilinear, unsigned short, unsigned short> > >, std::allocator<std::unique_ptr<ck::tensor_operation::device::DeviceGroupedConvBwdDataMultipleD<3, ck::tensor_layout::convolution::NDHWGK, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<ck::tensor_layout::convolution::NDHWGC>, ck::tensor_layout::convolution::NDHWGC, unsigned short, unsigned short, ck::Tuple<unsigned short>, unsigned short, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Bilinear, unsigned short, unsigned short>, std::default_delete<ck::tensor_operation::device::DeviceGroupedConvBwdDataMultipleD<3, ck::tensor_layout::convolution::NDHWGK, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<ck::tensor_layout::convolution::NDHWGC>, ck::tensor_layout::convolution::NDHWGC, unsigned short, unsigned short, ck::Tuple<unsigned short>, unsigned short, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Bilinear, unsigned short, unsigned short> > > > >&)'
/usr/bin/ld: lib/libMIOpen.so.1.0: undefined reference to `ck::tensor_operation::device::instance::add_device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_int8_instances(std::vector<std::unique_ptr<ck::tensor_operation::device::DeviceConvFwd<2, ck::tensor_layout::convolution::NHWC, ck::tensor_layout::convolution::KYXC, ck::tensor_layout::convolution::NHWK, signed char, signed char, signed char, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough>, std::default_delete<ck::tensor_operation::device::DeviceConvFwd<2, ck::tensor_layout::convolution::NHWC, ck::tensor_layout::convolution::KYXC, ck::tensor_layout::convolution::NHWK, signed char, signed char, signed char, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough> > >, std::allocator<std::unique_ptr<ck::tensor_operation::device::DeviceConvFwd<2, ck::tensor_layout::convolution::NHWC, ck::tensor_layout::convolution::KYXC, ck::tensor_layout::convolution::NHWK, signed char, signed char, signed char, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough>, std::default_delete<ck::tensor_operation::device::DeviceConvFwd<2, ck::tensor_layout::convolution::NHWC, ck::tensor_layout::convolution::KYXC, ck::tensor_layout::convolution::NHWK, signed char, signed char, signed char, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough> > > > >&)'
/usr/bin/ld: lib/libMIOpen.so.1.0: undefined reference to `ck::tensor_operation::device::instance::add_device_conv2d_fwd_xdl_c_shuffle_bias_relu_nhwc_kyxc_nhwk_f16_instances(std::vector<std::unique_ptr<ck::tensor_operation::device::DeviceConvFwdBiasActivation<ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::AddRelu>, std::default_delete<ck::tensor_operation::device::DeviceConvFwdBiasActivation<ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::AddRelu> > >, std::allocator<std::unique_ptr<ck::tensor_operation::device::DeviceConvFwdBiasActivation<ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::AddRelu>, std::default_delete<ck::tensor_operation::device::DeviceConvFwdBiasActivation<ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::AddRelu> > > > >&)'
/usr/bin/ld: lib/libMIOpen.so.1.0: undefined reference to `ck::tensor_operation::device::instance::add_device_grouped_conv3d_bwd_data_xdl_scale_ndhwgk_gkzyxc_ndhwgc_f16_instances(std::vector<std::unique_ptr<ck::tensor_operation::device::DeviceGroupedConvBwdDataMultipleD<3, ck::tensor_layout::convolution::NDHWGK, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<>, ck::tensor_layout::convolution::NDHWGC, _Float16, _Float16, ck::Tuple<>, _Float16, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Scale, _Float16, _Float16>, std::default_delete<ck::tensor_operation::device::DeviceGroupedConvBwdDataMultipleD<3, ck::tensor_layout::convolution::NDHWGK, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<>, ck::tensor_layout::convolution::NDHWGC, _Float16, _Float16, ck::Tuple<>, _Float16, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Scale, _Float16, _Float16> > >, std::allocator<std::unique_ptr<ck::tensor_operation::device::DeviceGroupedConvBwdDataMultipleD<3, ck::tensor_layout::convolution::NDHWGK, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<>, ck::tensor_layout::convolution::NDHWGC, _Float16, _Float16, ck::Tuple<>, _Float16, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Scale, _Float16, _Float16>, std::default_delete<ck::tensor_operation::device::DeviceGroupedConvBwdDataMultipleD<3, ck::tensor_layout::convolution::NDHWGK, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<>, ck::tensor_layout::convolution::NDHWGC, _Float16, _Float16, ck::Tuple<>, _Float16, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Scale, _Float16, _Float16> > > > >&)'
/usr/bin/ld: lib/libMIOpen.so.1.0: undefined reference to `ck::tensor_operation::device::instance::add_device_grouped_conv3d_bwd_data_xdl_bilinear_ndhwgk_gkzyxc_ndhwgc_f16_instances(std::vector<std::unique_ptr<ck::tensor_operation::device::DeviceGroupedConvBwdDataMultipleD<3, ck::tensor_layout::convolution::NDHWGK, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<ck::tensor_layout::convolution::NDHWGC>, ck::tensor_layout::convolution::NDHWGC, _Float16, _Float16, ck::Tuple<_Float16>, _Float16, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Bilinear, _Float16, _Float16>, std::default_delete<ck::tensor_operation::device::DeviceGroupedConvBwdDataMultipleD<3, ck::tensor_layout::convolution::NDHWGK, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<ck::tensor_layout::convolution::NDHWGC>, ck::tensor_layout::convolution::NDHWGC, _Float16, _Float16, ck::Tuple<_Float16>, _Float16, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Bilinear, _Float16, _Float16> > >, std::allocator<std::unique_ptr<ck::tensor_operation::device::DeviceGroupedConvBwdDataMultipleD<3, ck::tensor_layout::convolution::NDHWGK, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<ck::tensor_layout::convolution::NDHWGC>, ck::tensor_layout::convolution::NDHWGC, _Float16, _Float16, ck::Tuple<_Float16>, _Float16, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Bilinear, _Float16, _Float16>, std::default_delete<ck::tensor_operation::device::DeviceGroupedConvBwdDataMultipleD<3, ck::tensor_layout::convolution::NDHWGK, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<ck::tensor_layout::convolution::NDHWGC>, ck::tensor_layout::convolution::NDHWGC, _Float16, _Float16, ck::Tuple<_Float16>, _Float16, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Bilinear, _Float16, _Float16> > > > >&)'
/usr/bin/ld: lib/libMIOpen.so.1.0: undefined reference to `ck::tensor_operation::device::instance::add_device_grouped_conv3d_fwd_xdl_scale_ndhwgc_gkzyxc_ndhwgk_f32_instances(std::vector<std::unique_ptr<ck::tensor_operation::device::DeviceGroupedConvFwdMultipleABD<3, ck::tensor_layout::convolution::NDHWGC, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<>, ck::tensor_layout::convolution::NDHWGK, float, float, ck::Tuple<>, float, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Scale, float, float>, std::default_delete<ck::tensor_operation::device::DeviceGroupedConvFwdMultipleABD<3, ck::tensor_layout::convolution::NDHWGC, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<>, ck::tensor_layout::convolution::NDHWGK, float, float, ck::Tuple<>, float, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Scale, float, float> > >, std::allocator<std::unique_ptr<ck::tensor_operation::device::DeviceGroupedConvFwdMultipleABD<3, ck::tensor_layout::convolution::NDHWGC, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<>, ck::tensor_layout::convolution::NDHWGK, float, float, ck::Tuple<>, float, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Scale, float, float>, std::default_delete<ck::tensor_operation::device::DeviceGroupedConvFwdMultipleABD<3, ck::tensor_layout::convolution::NDHWGC, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<>, ck::tensor_layout::convolution::NDHWGK, float, float, ck::Tuple<>, float, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Scale, float, float> > > > >&)'
/usr/bin/ld: lib/libMIOpen.so.1.0: undefined reference to `ck::tensor_operation::device::instance::add_device_grouped_conv3d_fwd_xdl_scale_ndhwgc_gkzyxc_ndhwgk_f16_instances(std::vector<std::unique_ptr<ck::tensor_operation::device::DeviceGroupedConvFwdMultipleABD<3, ck::tensor_layout::convolution::NDHWGC, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<>, ck::tensor_layout::convolution::NDHWGK, _Float16, _Float16, ck::Tuple<>, _Float16, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Scale, _Float16, _Float16>, std::default_delete<ck::tensor_operation::device::DeviceGroupedConvFwdMultipleABD<3, ck::tensor_layout::convolution::NDHWGC, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<>, ck::tensor_layout::convolution::NDHWGK, _Float16, _Float16, ck::Tuple<>, _Float16, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Scale, _Float16, _Float16> > >, std::allocator<std::unique_ptr<ck::tensor_operation::device::DeviceGroupedConvFwdMultipleABD<3, ck::tensor_layout::convolution::NDHWGC, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<>, ck::tensor_layout::convolution::NDHWGK, _Float16, _Float16, ck::Tuple<>, _Float16, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Scale, _Float16, _Float16>, std::default_delete<ck::tensor_operation::device::DeviceGroupedConvFwdMultipleABD<3, ck::tensor_layout::convolution::NDHWGC, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<>, ck::tensor_layout::convolution::NDHWGK, _Float16, _Float16, ck::Tuple<>, _Float16, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Scale, _Float16, _Float16> > > > >&)'
/usr/bin/ld: lib/libMIOpen.so.1.0: undefined reference to `ck::tensor_operation::device::instance::add_device_grouped_conv3d_fwd_xdl_bilinear_ndhwgc_gkzyxc_ndhwgk_int8_instances(std::vector<std::unique_ptr<ck::tensor_operation::device::DeviceGroupedConvFwdMultipleABD<3, ck::tensor_layout::convolution::NDHWGC, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<ck::tensor_layout::convolution::NDHWGK>, ck::tensor_layout::convolution::NDHWGK, signed char, signed char, ck::Tuple<signed char>, signed char, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Bilinear, signed char, signed char>, std::default_delete<ck::tensor_operation::device::DeviceGroupedConvFwdMultipleABD<3, ck::tensor_layout::convolution::NDHWGC, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<ck::tensor_layout::convolution::NDHWGK>, ck::tensor_layout::convolution::NDHWGK, signed char, signed char, ck::Tuple<signed char>, signed char, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Bilinear, signed char, signed char> > >, std::allocator<std::unique_ptr<ck::tensor_operation::device::DeviceGroupedConvFwdMultipleABD<3, ck::tensor_layout::convolution::NDHWGC, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<ck::tensor_layout::convolution::NDHWGK>, ck::tensor_layout::convolution::NDHWGK, signed char, signed char, ck::Tuple<signed char>, signed char, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Bilinear, signed char, signed char>, std::default_delete<ck::tensor_operation::device::DeviceGroupedConvFwdMultipleABD<3, ck::tensor_layout::convolution::NDHWGC, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<ck::tensor_layout::convolution::NDHWGK>, ck::tensor_layout::convolution::NDHWGK, signed char, signed char, ck::Tuple<signed char>, signed char, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Bilinear, signed char, signed char> > > > >&)'
/usr/bin/ld: lib/libMIOpen.so.1.0: undefined reference to `ck::tensor_operation::device::instance::add_device_grouped_conv3d_fwd_xdl_scaleadd_scaleadd_relu_ndhwgc_gkzyxc_ndhwgk_f32_instances(std::vector<std::unique_ptr<ck::tensor_operation::device::DeviceGroupedConvFwdMultipleABD<3, ck::tensor_layout::convolution::NDHWGC, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<ck::tensor_layout::convolution::NDHWGK, ck::tensor_layout::convolution::G_K>, ck::tensor_layout::convolution::NDHWGK, float, float, ck::Tuple<float, float>, float, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::ScaleAddScaleAddRelu, float, float>, std::default_delete<ck::tensor_operation::device::DeviceGroupedConvFwdMultipleABD<3, ck::tensor_layout::convolution::NDHWGC, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<ck::tensor_layout::convolution::NDHWGK, ck::tensor_layout::convolution::G_K>, ck::tensor_layout::convolution::NDHWGK, float, float, ck::Tuple<float, float>, float, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::ScaleAddScaleAddRelu, float, float> > >, std::allocator<std::unique_ptr<ck::tensor_operation::device::DeviceGroupedConvFwdMultipleABD<3, ck::tensor_layout::convolution::NDHWGC, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<ck::tensor_layout::convolution::NDHWGK, ck::tensor_layout::convolution::G_K>, ck::tensor_layout::convolution::NDHWGK, float, float, ck::Tuple<float, float>, float, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::ScaleAddScaleAddRelu, float, float>, std::default_delete<ck::tensor_operation::device::DeviceGroupedConvFwdMultipleABD<3, ck::tensor_layout::convolution::NDHWGC, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<ck::tensor_layout::convolution::NDHWGK, ck::tensor_layout::convolution::G_K>, ck::tensor_layout::convolution::NDHWGK, float, float, ck::Tuple<float, float>, float, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::ScaleAddScaleAddRelu, float, float> > > > >&)'
/usr/bin/ld: lib/libMIOpen.so.1.0: undefined reference to `ck::tensor_operation::device::instance::add_device_grouped_conv3d_fwd_xdl_bilinear_ndhwgc_gkzyxc_ndhwgk_f32_instances(std::vector<std::unique_ptr<ck::tensor_operation::device::DeviceGroupedConvFwdMultipleABD<3, ck::tensor_layout::convolution::NDHWGC, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<ck::tensor_layout::convolution::NDHWGK>, ck::tensor_layout::convolution::NDHWGK, float, float, ck::Tuple<float>, float, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Bilinear, float, float>, std::default_delete<ck::tensor_operation::device::DeviceGroupedConvFwdMultipleABD<3, ck::tensor_layout::convolution::NDHWGC, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<ck::tensor_layout::convolution::NDHWGK>, ck::tensor_layout::convolution::NDHWGK, float, float, ck::Tuple<float>, float, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Bilinear, float, float> > >, std::allocator<std::unique_ptr<ck::tensor_operation::device::DeviceGroupedConvFwdMultipleABD<3, ck::tensor_layout::convolution::NDHWGC, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<ck::tensor_layout::convolution::NDHWGK>, ck::tensor_layout::convolution::NDHWGK, float, float, ck::Tuple<float>, float, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Bilinear, float, float>, std::default_delete<ck::tensor_operation::device::DeviceGroupedConvFwdMultipleABD<3, ck::tensor_layout::convolution::NDHWGC, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<ck::tensor_layout::convolution::NDHWGK>, ck::tensor_layout::convolution::NDHWGK, float, float, ck::Tuple<float>, float, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Bilinear, float, float> > > > >&)'
/usr/bin/ld: lib/libMIOpen.so.1.0: undefined reference to `ck::tensor_operation::device::instance::add_device_grouped_conv3d_fwd_xdl_bilinear_ndhwgc_gkzyxc_ndhwgk_bf16_instances(std::vector<std::unique_ptr<ck::tensor_operation::device::DeviceGroupedConvFwdMultipleABD<3, ck::tensor_layout::convolution::NDHWGC, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<ck::tensor_layout::convolution::NDHWGK>, ck::tensor_layout::convolution::NDHWGK, unsigned short, unsigned short, ck::Tuple<unsigned short>, unsigned short, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Bilinear, unsigned short, unsigned short>, std::default_delete<ck::tensor_operation::device::DeviceGroupedConvFwdMultipleABD<3, ck::tensor_layout::convolution::NDHWGC, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<ck::tensor_layout::convolution::NDHWGK>, ck::tensor_layout::convolution::NDHWGK, unsigned short, unsigned short, ck::Tuple<unsigned short>, unsigned short, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Bilinear, unsigned short, unsigned short> > >, std::allocator<std::unique_ptr<ck::tensor_operation::device::DeviceGroupedConvFwdMultipleABD<3, ck::tensor_layout::convolution::NDHWGC, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<ck::tensor_layout::convolution::NDHWGK>, ck::tensor_layout::convolution::NDHWGK, unsigned short, unsigned short, ck::Tuple<unsigned short>, unsigned short, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Bilinear, unsigned short, unsigned short>, std::default_delete<ck::tensor_operation::device::DeviceGroupedConvFwdMultipleABD<3, ck::tensor_layout::convolution::NDHWGC, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<ck::tensor_layout::convolution::NDHWGK>, ck::tensor_layout::convolution::NDHWGK, unsigned short, unsigned short, ck::Tuple<unsigned short>, unsigned short, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Bilinear, unsigned short, unsigned short> > > > >&)'
/usr/bin/ld: lib/libMIOpen.so.1.0: undefined reference to `ck::tensor_operation::device::instance::add_device_grouped_conv3d_fwd_xdl_scaleadd_scaleadd_relu_ndhwgc_gkzyxc_ndhwgk_bf16_instances(std::vector<std::unique_ptr<ck::tensor_operation::device::DeviceGroupedConvFwdMultipleABD<3, ck::tensor_layout::convolution::NDHWGC, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<ck::tensor_layout::convolution::NDHWGK, ck::tensor_layout::convolution::G_K>, ck::tensor_layout::convolution::NDHWGK, unsigned short, unsigned short, ck::Tuple<unsigned short, unsigned short>, unsigned short, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::ScaleAddScaleAddRelu, unsigned short, unsigned short>, std::default_delete<ck::tensor_operation::device::DeviceGroupedConvFwdMultipleABD<3, ck::tensor_layout::convolution::NDHWGC, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<ck::tensor_layout::convolution::NDHWGK, ck::tensor_layout::convolution::G_K>, ck::tensor_layout::convolution::NDHWGK, unsigned short, unsigned short, ck::Tuple<unsigned short, unsigned short>, unsigned short, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::ScaleAddScaleAddRelu, unsigned short, unsigned short> > >, std::allocator<std::unique_ptr<ck::tensor_operation::device::DeviceGroupedConvFwdMultipleABD<3, ck::tensor_layout::convolution::NDHWGC, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<ck::tensor_layout::convolution::NDHWGK, ck::tensor_layout::convolution::G_K>, ck::tensor_layout::convolution::NDHWGK, unsigned short, unsigned short, ck::Tuple<unsigned short, unsigned short>, unsigned short, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::ScaleAddScaleAddRelu, unsigned short, unsigned short>, std::default_delete<ck::tensor_operation::device::DeviceGroupedConvFwdMultipleABD<3, ck::tensor_layout::convolution::NDHWGC, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<ck::tensor_layout::convolution::NDHWGK, ck::tensor_layout::convolution::G_K>, ck::tensor_layout::convolution::NDHWGK, unsigned short, unsigned short, ck::Tuple<unsigned short, unsigned short>, unsigned short, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::ScaleAddScaleAddRelu, unsigned short, unsigned short> > > > >&)'
/usr/bin/ld: lib/libMIOpen.so.1.0: undefined reference to `ck::tensor_operation::device::instance::add_device_grouped_conv3d_fwd_xdl_scale_ndhwgc_gkzyxc_ndhwgk_int8_instances(std::vector<std::unique_ptr<ck::tensor_operation::device::DeviceGroupedConvFwdMultipleABD<3, ck::tensor_layout::convolution::NDHWGC, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<>, ck::tensor_layout::convolution::NDHWGK, signed char, signed char, ck::Tuple<>, signed char, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Scale, signed char, signed char>, std::default_delete<ck::tensor_operation::device::DeviceGroupedConvFwdMultipleABD<3, ck::tensor_layout::convolution::NDHWGC, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<>, ck::tensor_layout::convolution::NDHWGK, signed char, signed char, ck::Tuple<>, signed char, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Scale, signed char, signed char> > >, std::allocator<std::unique_ptr<ck::tensor_operation::device::DeviceGroupedConvFwdMultipleABD<3, ck::tensor_layout::convolution::NDHWGC, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<>, ck::tensor_layout::convolution::NDHWGK, signed char, signed char, ck::Tuple<>, signed char, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Scale, signed char, signed char>, std::default_delete<ck::tensor_operation::device::DeviceGroupedConvFwdMultipleABD<3, ck::tensor_layout::convolution::NDHWGC, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<>, ck::tensor_layout::convolution::NDHWGK, signed char, signed char, ck::Tuple<>, signed char, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Scale, signed char, signed char> > > > >&)'
/usr/bin/ld: lib/libMIOpen.so.1.0: undefined reference to `ck::tensor_operation::device::instance::add_device_grouped_conv3d_fwd_xdl_bilinear_ndhwgc_gkzyxc_ndhwgk_f16_instances(std::vector<std::unique_ptr<ck::tensor_operation::device::DeviceGroupedConvFwdMultipleABD<3, ck::tensor_layout::convolution::NDHWGC, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<ck::tensor_layout::convolution::NDHWGK>, ck::tensor_layout::convolution::NDHWGK, _Float16, _Float16, ck::Tuple<_Float16>, _Float16, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Bilinear, _Float16, _Float16>, std::default_delete<ck::tensor_operation::device::DeviceGroupedConvFwdMultipleABD<3, ck::tensor_layout::convolution::NDHWGC, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<ck::tensor_layout::convolution::NDHWGK>, ck::tensor_layout::convolution::NDHWGK, _Float16, _Float16, ck::Tuple<_Float16>, _Float16, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Bilinear, _Float16, _Float16> > >, std::allocator<std::unique_ptr<ck::tensor_operation::device::DeviceGroupedConvFwdMultipleABD<3, ck::tensor_layout::convolution::NDHWGC, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<ck::tensor_layout::convolution::NDHWGK>, ck::tensor_layout::convolution::NDHWGK, _Float16, _Float16, ck::Tuple<_Float16>, _Float16, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Bilinear, _Float16, _Float16>, std::default_delete<ck::tensor_operation::device::DeviceGroupedConvFwdMultipleABD<3, ck::tensor_layout::convolution::NDHWGC, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<ck::tensor_layout::convolution::NDHWGK>, ck::tensor_layout::convolution::NDHWGK, _Float16, _Float16, ck::Tuple<_Float16>, _Float16, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Bilinear, _Float16, _Float16> > > > >&)'
/usr/bin/ld: lib/libMIOpen.so.1.0: undefined reference to `fmha_fwd(fmha_fwd_traits, fmha_fwd_args, ck_tile::stream_config const&)'
/usr/bin/ld: lib/libMIOpen.so.1.0: undefined reference to `ck::tensor_operation::device::instance::add_device_conv2d_bwd_data_xdl_nhwc_kyxc_nhwk_f16_instances(std::vector<std::unique_ptr<ck::tensor_operation::device::DeviceConvBwdData<2, ck::tensor_layout::convolution::NHWC, ck::tensor_layout::convolution::KYXC, ck::tensor_layout::convolution::NHWK, _Float16, _Float16, _Float16, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough>, std::default_delete<ck::tensor_operation::device::DeviceConvBwdData<2, ck::tensor_layout::convolution::NHWC, ck::tensor_layout::convolution::KYXC, ck::tensor_layout::convolution::NHWK, _Float16, _Float16, _Float16, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough> > >, std::allocator<std::unique_ptr<ck::tensor_operation::device::DeviceConvBwdData<2, ck::tensor_layout::convolution::NHWC, ck::tensor_layout::convolution::KYXC, ck::tensor_layout::convolution::NHWK, _Float16, _Float16, _Float16, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough>, std::default_delete<ck::tensor_operation::device::DeviceConvBwdData<2, ck::tensor_layout::convolution::NHWC, ck::tensor_layout::convolution::KYXC, ck::tensor_layout::convolution::NHWK, _Float16, _Float16, _Float16, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough> > > > >&)'
/usr/bin/ld: lib/libMIOpen.so.1.0: undefined reference to `ck::tensor_operation::device::instance::add_device_conv2d_fwd_xdl_nhwc_kyxc_nhwk_f16_instances(std::vector<std::unique_ptr<ck::tensor_operation::device::DeviceConvFwd<2, ck::tensor_layout::convolution::NHWC, ck::tensor_layout::convolution::KYXC, ck::tensor_layout::convolution::NHWK, _Float16, _Float16, _Float16, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough>, std::default_delete<ck::tensor_operation::device::DeviceConvFwd<2, ck::tensor_layout::convolution::NHWC, ck::tensor_layout::convolution::KYXC, ck::tensor_layout::convolution::NHWK, _Float16, _Float16, _Float16, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough> > >, std::allocator<std::unique_ptr<ck::tensor_operation::device::DeviceConvFwd<2, ck::tensor_layout::convolution::NHWC, ck::tensor_layout::convolution::KYXC, ck::tensor_layout::convolution::NHWK, _Float16, _Float16, _Float16, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough>, std::default_delete<ck::tensor_operation::device::DeviceConvFwd<2, ck::tensor_layout::convolution::NHWC, ck::tensor_layout::convolution::KYXC, ck::tensor_layout::convolution::NHWK, _Float16, _Float16, _Float16, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough> > > > >&)'
/usr/bin/ld: lib/libMIOpen.so.1.0: undefined reference to `ck::tensor_operation::device::instance::add_device_grouped_conv3d_fwd_xdl_scaleadd_scaleadd_relu_ndhwgc_gkzyxc_ndhwgk_int8_instances(std::vector<std::unique_ptr<ck::tensor_operation::device::DeviceGroupedConvFwdMultipleABD<3, ck::tensor_layout::convolution::NDHWGC, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<ck::tensor_layout::convolution::NDHWGK, ck::tensor_layout::convolution::G_K>, ck::tensor_layout::convolution::NDHWGK, signed char, signed char, ck::Tuple<float, float>, signed char, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::ScaleAddScaleAddRelu, signed char, signed char>, std::default_delete<ck::tensor_operation::device::DeviceGroupedConvFwdMultipleABD<3, ck::tensor_layout::convolution::NDHWGC, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<ck::tensor_layout::convolution::NDHWGK, ck::tensor_layout::convolution::G_K>, ck::tensor_layout::convolution::NDHWGK, signed char, signed char, ck::Tuple<float, float>, signed char, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::ScaleAddScaleAddRelu, signed char, signed char> > >, std::allocator<std::unique_ptr<ck::tensor_operation::device::DeviceGroupedConvFwdMultipleABD<3, ck::tensor_layout::convolution::NDHWGC, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<ck::tensor_layout::convolution::NDHWGK, ck::tensor_layout::convolution::G_K>, ck::tensor_layout::convolution::NDHWGK, signed char, signed char, ck::Tuple<float, float>, signed char, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::ScaleAddScaleAddRelu, signed char, signed char>, std::default_delete<ck::tensor_operation::device::DeviceGroupedConvFwdMultipleABD<3, ck::tensor_layout::convolution::NDHWGC, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<ck::tensor_layout::convolution::NDHWGK, ck::tensor_layout::convolution::G_K>, ck::tensor_layout::convolution::NDHWGK, signed char, signed char, ck::Tuple<float, float>, signed char, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::ScaleAddScaleAddRelu, signed char, signed char> > > > >&)'
/usr/bin/ld: lib/libMIOpen.so.1.0: undefined reference to `ck::tensor_operation::device::instance::add_device_grouped_conv3d_bwd_data_xdl_scale_ndhwgk_gkzyxc_ndhwgc_bf16_instances(std::vector<std::unique_ptr<ck::tensor_operation::device::DeviceGroupedConvBwdDataMultipleD<3, ck::tensor_layout::convolution::NDHWGK, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<>, ck::tensor_layout::convolution::NDHWGC, unsigned short, unsigned short, ck::Tuple<>, unsigned short, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Scale, unsigned short, unsigned short>, std::default_delete<ck::tensor_operation::device::DeviceGroupedConvBwdDataMultipleD<3, ck::tensor_layout::convolution::NDHWGK, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<>, ck::tensor_layout::convolution::NDHWGC, unsigned short, unsigned short, ck::Tuple<>, unsigned short, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Scale, unsigned short, unsigned short> > >, std::allocator<std::unique_ptr<ck::tensor_operation::device::DeviceGroupedConvBwdDataMultipleD<3, ck::tensor_layout::convolution::NDHWGK, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<>, ck::tensor_layout::convolution::NDHWGC, unsigned short, unsigned short, ck::Tuple<>, unsigned short, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Scale, unsigned short, unsigned short>, std::default_delete<ck::tensor_operation::device::DeviceGroupedConvBwdDataMultipleD<3, ck::tensor_layout::convolution::NDHWGK, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<>, ck::tensor_layout::convolution::NDHWGC, unsigned short, unsigned short, ck::Tuple<>, unsigned short, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::Scale, unsigned short, unsigned short> > > > >&)'
/usr/bin/ld: lib/libMIOpen.so.1.0: undefined reference to `ck::tensor_operation::device::instance::add_device_grouped_conv3d_fwd_xdl_scaleadd_scaleadd_relu_ndhwgc_gkzyxc_ndhwgk_f16_instances(std::vector<std::unique_ptr<ck::tensor_operation::device::DeviceGroupedConvFwdMultipleABD<3, ck::tensor_layout::convolution::NDHWGC, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<ck::tensor_layout::convolution::NDHWGK, ck::tensor_layout::convolution::G_K>, ck::tensor_layout::convolution::NDHWGK, _Float16, _Float16, ck::Tuple<_Float16, _Float16>, _Float16, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::ScaleAddScaleAddRelu, _Float16, _Float16>, std::default_delete<ck::tensor_operation::device::DeviceGroupedConvFwdMultipleABD<3, ck::tensor_layout::convolution::NDHWGC, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<ck::tensor_layout::convolution::NDHWGK, ck::tensor_layout::convolution::G_K>, ck::tensor_layout::convolution::NDHWGK, _Float16, _Float16, ck::Tuple<_Float16, _Float16>, _Float16, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::ScaleAddScaleAddRelu, _Float16, _Float16> > >, std::allocator<std::unique_ptr<ck::tensor_operation::device::DeviceGroupedConvFwdMultipleABD<3, ck::tensor_layout::convolution::NDHWGC, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<ck::tensor_layout::convolution::NDHWGK, ck::tensor_layout::convolution::G_K>, ck::tensor_layout::convolution::NDHWGK, _Float16, _Float16, ck::Tuple<_Float16, _Float16>, _Float16, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::ScaleAddScaleAddRelu, _Float16, _Float16>, std::default_delete<ck::tensor_operation::device::DeviceGroupedConvFwdMultipleABD<3, ck::tensor_layout::convolution::NDHWGC, ck::tensor_layout::convolution::GKZYXC, ck::Tuple<ck::tensor_layout::convolution::NDHWGK, ck::tensor_layout::convolution::G_K>, ck::tensor_layout::convolution::NDHWGK, _Float16, _Float16, ck::Tuple<_Float16, _Float16>, _Float16, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::ScaleAddScaleAddRelu, _Float16, _Float16> > > > >&)'
/usr/bin/ld: lib/libMIOpen.so.1.0: undefined reference to `ck::tensor_operation::device::instance::add_device_conv2d_fwd_xdl_c_shuffle_nhwc_kyxc_nhwk_f16_instances(std::vector<std::unique_ptr<ck::tensor_operation::device::DeviceConvFwd<2, ck::tensor_layout::convolution::NHWC, ck::tensor_layout::convolution::KYXC, ck::tensor_layout::convolution::NHWK, _Float16, _Float16, _Float16, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough>, std::default_delete<ck::tensor_operation::device::DeviceConvFwd<2, ck::tensor_layout::convolution::NHWC, ck::tensor_layout::convolution::KYXC, ck::tensor_layout::convolution::NHWK, _Float16, _Float16, _Float16, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough> > >, std::allocator<std::unique_ptr<ck::tensor_operation::device::DeviceConvFwd<2, ck::tensor_layout::convolution::NHWC, ck::tensor_layout::convolution::KYXC, ck::tensor_layout::convolution::NHWK, _Float16, _Float16, _Float16, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough>, std::default_delete<ck::tensor_operation::device::DeviceConvFwd<2, ck::tensor_layout::convolution::NHWC, ck::tensor_layout::convolution::KYXC, ck::tensor_layout::convolution::NHWK, _Float16, _Float16, _Float16, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough, ck::tensor_operation::element_wise::PassThrough> > > > >&)'
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
ninja: build stopped: subcommand failed.

Jul 18 '25 01:07 TheTrustedComputer

Adding gfx908 to GPU_ARCHS, caused all functions except fmha_fwd(...) to disappear.

However, replacing gfx908 with a gfx90a to eliminate the last function resulted in a compilation error midway.

In file included from /root/rocm-build/composable-kernel/library/src/tensor_operation_instance/gpu/mha/fmha_fwd_d32_fp16_batch_b128x64x16x32x32x32_r2x1x1_r2x1x1_w32x32x16_w32x32x16_qr_async_vr_psddv_lse_dropout.cpp:5:
In file included from /root/rocm-build/composable-kernel/library/src/tensor_operation_instance/gpu/mha/fmha_fwd.hpp:6:
In file included from /root/ROCm/composable_kernel/include/ck_tile/core.hpp:11:
/root/ROCm/composable_kernel/include/ck_tile/core/arch/amd_buffer_addressing.hpp:231:26: error: invalid operand for instruction
  231 |             asm volatile("v_cmpx_le_u32 exec, 1, %4\n"
      |                          ^
<inline asm>:1:16: note: instantiated into assembly here
    1 |         v_cmpx_le_u32 exec, 1, v17
      |                       ^
In file included from /root/rocm-build/composable-kernel/library/src/tensor_operation_instance/gpu/mha/fmha_fwd_d32_fp16_batch_b128x64x16x32x32x32_r2x1x1_r2x1x1_w32x32x16_w32x32x16_qr_async_vr_psddv_lse_dropout.cpp:5:
In file included from /root/rocm-build/composable-kernel/library/src/tensor_operation_instance/gpu/mha/fmha_fwd.hpp:6:
In file included from /root/ROCm/composable_kernel/include/ck_tile/core.hpp:11:
/root/ROCm/composable_kernel/include/ck_tile/core/arch/amd_buffer_addressing.hpp:231:26: error: invalid operand for instruction
  231 |             asm volatile("v_cmpx_le_u32 exec, 1, %4\n"
      |                          ^
<inline asm>:1:16: note: instantiated into assembly here
    1 |         v_cmpx_le_u32 exec, 1, v10
      |                       ^
In file included from /root/rocm-build/composable-kernel/library/src/tensor_operation_instance/gpu/mha/fmha_fwd_d32_fp16_batch_b128x64x16x32x32x32_r2x1x1_r2x1x1_w32x32x16_w32x32x16_qr_async_vr_psddv_lse_dropout.cpp:5:
In file included from /root/rocm-build/composable-kernel/library/src/tensor_operation_instance/gpu/mha/fmha_fwd.hpp:6:
In file included from /root/ROCm/composable_kernel/include/ck_tile/core.hpp:11:
/root/ROCm/composable_kernel/include/ck_tile/core/arch/amd_buffer_addressing.hpp:231:26: error: invalid operand for instruction
  231 |             asm volatile("v_cmpx_le_u32 exec, 1, %4\n"
      |                          ^
<inline asm>:1:16: note: instantiated into assembly here
    1 |         v_cmpx_le_u32 exec, 1, v12
      |                       ^
In file included from /root/rocm-build/composable-kernel/library/src/tensor_operation_instance/gpu/mha/fmha_fwd_d32_fp16_batch_b128x64x16x32x32x32_r2x1x1_r2x1x1_w32x32x16_w32x32x16_qr_async_vr_psddv_lse_dropout.cpp:5:
In file included from /root/rocm-build/composable-kernel/library/src/tensor_operation_instance/gpu/mha/fmha_fwd.hpp:6:
In file included from /root/ROCm/composable_kernel/include/ck_tile/core.hpp:11:
/root/ROCm/composable_kernel/include/ck_tile/core/arch/amd_buffer_addressing.hpp:231:26: error: invalid operand for instruction
  231 |             asm volatile("v_cmpx_le_u32 exec, 1, %4\n"
      |                          ^
<inline asm>:1:16: note: instantiated into assembly here
    1 |         v_cmpx_le_u32 exec, 1, v18
      |                       ^
In file included from /root/rocm-build/composable-kernel/library/src/tensor_operation_instance/gpu/mha/fmha_fwd_d32_fp16_batch_b128x64x16x32x32x32_r2x1x1_r2x1x1_w32x32x16_w32x32x16_qr_async_vr_psddv_lse_dropout.cpp:5:
In file included from /root/rocm-build/composable-kernel/library/src/tensor_operation_instance/gpu/mha/fmha_fwd.hpp:6:
In file included from /root/ROCm/composable_kernel/include/ck_tile/core.hpp:11:
/root/ROCm/composable_kernel/include/ck_tile/core/arch/amd_buffer_addressing.hpp:521:22: error: invalid operand for instruction
  521 |         asm volatile("v_cmpx_le_u32 exec, 1, %4\n"
      |                      ^
<inline asm>:1:16: note: instantiated into assembly here
    1 |         v_cmpx_le_u32 exec, 1, v50
      |                       ^
In file included from /root/rocm-build/composable-kernel/library/src/tensor_operation_instance/gpu/mha/fmha_fwd_d32_fp16_batch_b128x64x16x32x32x32_r2x1x1_r2x1x1_w32x32x16_w32x32x16_qr_async_vr_psddv_lse_dropout.cpp:5:
In file included from /root/rocm-build/composable-kernel/library/src/tensor_operation_instance/gpu/mha/fmha_fwd.hpp:6:
In file included from /root/ROCm/composable_kernel/include/ck_tile/core.hpp:11:
/root/ROCm/composable_kernel/include/ck_tile/core/arch/amd_buffer_addressing.hpp:521:22: error: invalid operand for instruction
  521 |         asm volatile("v_cmpx_le_u32 exec, 1, %4\n"
      |                      ^
<inline asm>:1:16: note: instantiated into assembly here
    1 |         v_cmpx_le_u32 exec, 1, v51
      |                       ^
In file included from /root/rocm-build/composable-kernel/library/src/tensor_operation_instance/gpu/mha/fmha_fwd_d32_fp16_batch_b128x64x16x32x32x32_r2x1x1_r2x1x1_w32x32x16_w32x32x16_qr_async_vr_psddv_lse_dropout.cpp:5:
In file included from /root/rocm-build/composable-kernel/library/src/tensor_operation_instance/gpu/mha/fmha_fwd.hpp:6:
In file included from /root/ROCm/composable_kernel/include/ck_tile/core.hpp:11:
/root/ROCm/composable_kernel/include/ck_tile/core/arch/amd_buffer_addressing.hpp:521:22: error: invalid operand for instruction
  521 |         asm volatile("v_cmpx_le_u32 exec, 1, %4\n"
      |                      ^
<inline asm>:1:16: note: instantiated into assembly here
    1 |         v_cmpx_le_u32 exec, 1, v52
      |                       ^
In file included from /root/rocm-build/composable-kernel/library/src/tensor_operation_instance/gpu/mha/fmha_fwd_d32_fp16_batch_b128x64x16x32x32x32_r2x1x1_r2x1x1_w32x32x16_w32x32x16_qr_async_vr_psddv_lse_dropout.cpp:5:
In file included from /root/rocm-build/composable-kernel/library/src/tensor_operation_instance/gpu/mha/fmha_fwd.hpp:6:
In file included from /root/ROCm/composable_kernel/include/ck_tile/core.hpp:11:
/root/ROCm/composable_kernel/include/ck_tile/core/arch/amd_buffer_addressing.hpp:521:22: error: invalid operand for instruction
  521 |         asm volatile("v_cmpx_le_u32 exec, 1, %4\n"
      |                      ^
<inline asm>:1:16: note: instantiated into assembly here
    1 |         v_cmpx_le_u32 exec, 1, v55
      |                       ^
In file included from /root/rocm-build/composable-kernel/library/src/tensor_operation_instance/gpu/mha/fmha_fwd_d32_fp16_batch_b128x64x16x32x32x32_r2x1x1_r2x1x1_w32x32x16_w32x32x16_qr_async_vr_psddv_lse_dropout.cpp:5:
In file included from /root/rocm-build/composable-kernel/library/src/tensor_operation_instance/gpu/mha/fmha_fwd.hpp:6:
In file included from /root/ROCm/composable_kernel/include/ck_tile/core.hpp:11:
/root/ROCm/composable_kernel/include/ck_tile/core/arch/amd_buffer_addressing.hpp:521:22: error: invalid operand for instruction
  521 |         asm volatile("v_cmpx_le_u32 exec, 1, %4\n"
      |                      ^
<inline asm>:1:16: note: instantiated into assembly here
    1 |         v_cmpx_le_u32 exec, 1, v57
      |                       ^
In file included from /root/rocm-build/composable-kernel/library/src/tensor_operation_instance/gpu/mha/fmha_fwd_d32_fp16_batch_b128x64x16x32x32x32_r2x1x1_r2x1x1_w32x32x16_w32x32x16_qr_async_vr_psddv_lse_dropout.cpp:5:
In file included from /root/rocm-build/composable-kernel/library/src/tensor_operation_instance/gpu/mha/fmha_fwd.hpp:6:
In file included from /root/ROCm/composable_kernel/include/ck_tile/core.hpp:11:
/root/ROCm/composable_kernel/include/ck_tile/core/arch/amd_buffer_addressing.hpp:521:22: error: invalid operand for instruction
  521 |         asm volatile("v_cmpx_le_u32 exec, 1, %4\n"
      |                      ^
<inline asm>:1:16: note: instantiated into assembly here
    1 |         v_cmpx_le_u32 exec, 1, v53
      |                       ^
In file included from /root/rocm-build/composable-kernel/library/src/tensor_operation_instance/gpu/mha/fmha_fwd_d32_fp16_batch_b128x64x16x32x32x32_r2x1x1_r2x1x1_w32x32x16_w32x32x16_qr_async_vr_psddv_lse_dropout.cpp:5:
In file included from /root/rocm-build/composable-kernel/library/src/tensor_operation_instance/gpu/mha/fmha_fwd.hpp:6:
In file included from /root/ROCm/composable_kernel/include/ck_tile/core.hpp:11:
/root/ROCm/composable_kernel/include/ck_tile/core/arch/amd_buffer_addressing.hpp:521:22: error: invalid operand for instruction
  521 |         asm volatile("v_cmpx_le_u32 exec, 1, %4\n"
      |                      ^
<inline asm>:1:16: note: instantiated into assembly here
    1 |         v_cmpx_le_u32 exec, 1, v54
      |                       ^
In file included from /root/rocm-build/composable-kernel/library/src/tensor_operation_instance/gpu/mha/fmha_fwd_d32_fp16_batch_b128x64x16x32x32x32_r2x1x1_r2x1x1_w32x32x16_w32x32x16_qr_async_vr_psddv_lse_dropout.cpp:5:
In file included from /root/rocm-build/composable-kernel/library/src/tensor_operation_instance/gpu/mha/fmha_fwd.hpp:6:
In file included from /root/ROCm/composable_kernel/include/ck_tile/core.hpp:11:
/root/ROCm/composable_kernel/include/ck_tile/core/arch/amd_buffer_addressing.hpp:521:22: error: invalid operand for instruction
  521 |         asm volatile("v_cmpx_le_u32 exec, 1, %4\n"
      |                      ^
<inline asm>:1:16: note: instantiated into assembly here
    1 |         v_cmpx_le_u32 exec, 1, v0
      |                       ^
12 errors generated when compiling for gfx1010.

According to the LLVM assembly documentation, v_cmpx_le_u32 is a supported operand on RDNA1, but it cannot accept three arguments. Because of this, my custom RDNA1 builds do not include composable kernels until the issue is resolved on either here or the MIOpen side.

Jul 19 '25 20:07 TheTrustedComputer

My mistake.

I forgot to use the provided script to replace MIOPEN_USE_COMPOSABLEKERNEL with MIOPEN_USE_CK_*_OPS in the C++ sources, and MIOpen with CK support finally compiled without any linker errors.

Although the patch was intended for 6.3 on Gentoo, it still works with 6.4 on any modern Linux system.

Aug 11 '25 20:08 TheTrustedComputer