kahakuka
kahakuka
MMCV_WITH_OPS=1 python3 setup.py -v bdist_wheel
@teamwong111 yes,Just like you said. But I want to provide ".whl" package. Is there any other way? The solution is setup Include in "include_package_data" Change data to false. The compilation...
The template is as follows: using ALayout = Row; using BLayout = Row; using DLayout = Row; using ELayout = Row; using DeviceOpInstance = ck::tensor_operation::device::DeviceGemmMultipleABD_Xdl_CShuffle< ck::Tuple, ck::Tuple, ck::Tuple, ELayout, ck::Tuple,...
@zjing14 Can you help me answer this?
@zjing14 Thank you for your answer.gemm case :60_gemm_multi_ABD.I referred to the modifications in your PR (https://github.com/ROCm/composable_kernel/pull/978)and added the function TransposeFromElmToDst to implement it. The layout of B is row, which...
@zjing14 yes,However, when B takes 8 at a time, the performance will be very poor.Change 1, 2, 8 here to 1, 8, 8. 
@zjing14 Thank you for your answer.Not use int4x2.It quantifies fp16 into int4 according to a certain pattern and stores it in a uint32 type.The paper introduces it this way. 
@zjing14 Is it easy to implement the integration of int4+gemm on composable_kernel by referring to the method of mma in llm-awq? llm-awq:The processing of int4 dequation can refer to this...
@ilmarkov Hello, I am in version 5.7 of rocm. The 'hipIpcMemLazyEnabled PeerAccess' of the following function needs to be' 0 '. The accuracy is incorrect after changing to '0'. Excuse...
> @kahakuka感谢您的留言!您使用的是 MI200 GPU 吗? Yes, I also tried it on MI250, it's the same.