[QST] how to fix the compiling error: static assertion failed with "Vectors implied by the thread map must be divisible by the access type."
What is your question? I am trying to change the type in example '24_gemm_grouped' from cutlass::half_t to double. I have change ThreadblockShape/WarpShape/InstructionShape. it works fine for class GemmBatched
// Gemm operator cutlass_tensorop_f16_s16816gemm_f16_128x128_32x4_nt_align8
using GemmBatched = cutlass::gemm::device::GemmUniversal<
ElementA, LayoutA,
ElementB, LayoutB,
ElementOutput, LayoutC,
ElementAccumulator,
cutlass::arch::OpClassTensorOp,
cutlass::arch::Sm80,
cutlass::gemm::GemmShape<128, 128, 16>,
cutlass::gemm::GemmShape<32, 64, 16>,
cutlass::gemm::GemmShape<8, 8, 4>,
cutlass::epilogue::thread::LinearCombination<
ElementOutput,
128 / cutlass::sizeof_bits<ElementOutput>::value,
ElementAccumulator,
ElementAccumulator
>,
cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<8>,
4
>;
I also do the similar change in class GemmKernel, but compiler reports "cutlass-main/include/cutlass/transform/threadblock/predicated_tile_access_iterator.h(360): error: static assertion failed with "Vectors implied by the thread map must be divisible by the access type."" How can I fix it?
using GemmKernel = typename cutlass::gemm::kernel::DefaultGemmGrouped<
ElementA,
LayoutA,
cutlass::ComplexTransform::kNone,
8,
ElementB,
LayoutB,
cutlass::ComplexTransform::kNone,
8,
ElementOutput, LayoutC,
ElementAccumulator,
cutlass::arch::OpClassTensorOp,
cutlass::arch::Sm80,
cutlass::gemm::GemmShape<128, 128, 16>,
cutlass::gemm::GemmShape<32, 64, 16>,
cutlass::gemm::GemmShape<8, 8, 4>,
cutlass::epilogue::thread::LinearCombination<
ElementOutput, 128 / cutlass::sizeof_bits<ElementOutput>::value,
ElementAccumulator, ElementAccumulator>,
// NOTE: Threadblock swizzling is currently not supported by CUTLASS's grouped kernels.
// This parameter is passed in at present to match the APIs of other kernels. The parameter
// is unused within the kernel.
cutlass::gemm::threadblock::GemmBatchedIdentityThreadblockSwizzle,
4>::GemmKernel;
using GemmGrouped = cutlass::gemm::device::GemmGrouped<GemmKernel>;
Thanks
I found the compilation error can be fixed by changing the template param AlignmentA/AlignmentB from 8 to 1, any explanation?
using GemmKernel = typename cutlass::gemm::kernel::DefaultGemmGrouped<
ElementA,
LayoutA,
cutlass::ComplexTransform::kNone,
1,
ElementB,
LayoutB,
cutlass::ComplexTransform::kNone,
1,
ElementOutput, LayoutC,
ElementAccumulator,
cutlass::arch::OpClassTensorOp,
cutlass::arch::Sm80,
cutlass::gemm::GemmShape<128, 128, 16>,
cutlass::gemm::GemmShape<32, 64, 16>,
cutlass::gemm::GemmShape<8, 8, 4>,
cutlass::epilogue::thread::LinearCombination<
ElementOutput, 128 / cutlass::sizeof_bits<ElementOutput>::value,
ElementAccumulator, ElementAccumulator>,
// NOTE: Threadblock swizzling is currently not supported by CUTLASS's grouped kernels.
// This parameter is passed in at present to match the APIs of other kernels. The parameter
// is unused within the kernel.
cutlass::gemm::threadblock::GemmBatchedIdentityThreadblockSwizzle,
4>::GemmKernel;
using GemmGrouped = cutlass::gemm::device::GemmGrouped<GemmKernel>;
This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.
This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.