[QST]41_fused_multi_head_attention on sm89
What is your question? when I use sm89
int run_attention(Options& options) {
using Attention = AttentionKernel<
cutlass::half_t, // scalar_t
cutlass::arch::Sm89, // ArchTag
true, // Memory is aligned
kQueriesPerBlock,
kKeysPerBlock,
kMaxK,
false, // Supports dropout
false // Supports bias
>;
I get compile error
Building CUDA object examples/41_fused_multi_head_attention/CMakeFiles/41_fused_multi_head_attention_fixed_seqlen.dir/fused_multihead_attention_fixed_seqlen.cu.o
/cutlass/examples/41_fused_multi_head_attention/kernel_forward.h(409): error: incomplete type is not allowed
kIsAligned ? DefaultConfig::kAlignmentA : GemmType::kMinimumAlignment;
^
detected during:
instantiation of class "AttentionKernel<scalar_t_, ArchTag, isAligned_, kQueriesPerBlock_, kKeysPerBlock_, kMaxK_, kSupportsDropout_, kSupportsBias_, ToBatchHookType_>::MM0 [with scalar_t_=cutlass::half_t, ArchTag=cutlass::arch::Sm89, isAligned_=true, kQueriesPerBlock_=32, kKeysPerBlock_=128, kMaxK_=128, kSupportsDropout_=false, kSupportsBias_=false, ToBatchHookType_=DefaultToBatchHook]" at line 418
instantiation of class "AttentionKernel<scalar_t_, ArchTag, isAligned_, kQueriesPerBlock_, kKeysPerBlock_, kMaxK_, kSupportsDropout_, kSupportsBias_, ToBatchHookType_>::MM0 [with scalar_t_=cutlass::half_t, ArchTag=cutlass::arch::Sm89, isAligned_=true, kQueriesPerBlock_=32, kKeysPerBlock_=128, kMaxK_=128, kSupportsDropout_=false, kSupportsBias_=false, ToBatchHookType_=DefaultToBatchHook]" at line 1161
instantiation of class "AttentionKernel<scalar_t_, ArchTag, isAligned_, kQueriesPerBlock_, kKeysPerBlock_, kMaxK_, kSupportsDropout_, kSupportsBias_, ToBatchHookType_> [with scalar_t_=cutlass::half_t, ArchTag=cutlass::arch::Sm89, isAligned_=true, kQueriesPerBlock_=32, kKeysPerBlock_=128, kMaxK_=128, kSupportsDropout_=false, kSupportsBias_=false, ToBatchHookType_=DefaultToBatchHook]" at line 329 of /home/sunxin20/cutlass/examples/41_fused_multi_head_attention/fused_multihead_attention_fixed_seqlen.cu
instantiation of class "TestbedAttention<Attention> [with Attention=AttentionKernel<cutlass::half_t, cutlass::arch::Sm89, true, 32, 128, 128, false, false, DefaultToBatchHook>]" at line 1032 of /home/sunxin20/cutlass/examples/41_fused_multi_head_attention/fused_multihead_attention_fixed_seqlen.cu
instantiation of "int run_attention<kQueriesPerBlock,kKeysPerBlock,kMaxK>(Options &) [with kQueriesPerBlock=32, kKeysPerBlock=128, kMaxK=128]" at line 1110 of /home/sunxin20/cutlass/examples/41_fused_multi_head_attention/fused_multihead_attention_fixed_seqlen.cu
just change your CUTLASS_NVCC_ARCHS=89, do not edit this file yourself. There is no partial template specialization for SM89 arch tag.
just change your CUTLASS_NVCC_ARCHS=89, do not edit this file yourself. There is no partial template specialization for SM89 arch tag.
I see. Thank you! If I want support the fp8 input type, what should I pay attention to?
This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.
This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.