Sunny-bot1
Sunny-bot1
### PR Category Performance Optimization ### PR Types Bug fixes ### Description pcard-71500 fix fp8 data type bugs on cpu
Hi, The 45_dual_gemm example implements that the intermediate output and the final output are of the same type (D0, D1 and D2 must be the same type). To prevent loss...
**What is your question?** when I use sm89 ``` int run_attention(Options& options) { using Attention = AttentionKernel< cutlass::half_t, // scalar_t cutlass::arch::Sm89, // ArchTag true, // Memory is aligned kQueriesPerBlock, kKeysPerBlock,...
Hi, when I try to implement cuBLASLt FP8 batched gemm with bias based on LtFp8Matmul, I met this problem. ``` [2024-05-22 07:06:23][cublasLt][62029][Error][cublasLtMatmulAlgoGetHeuristic] Failed to query heuristics. cuBLAS API failed with...
### PR types ### PR changes ### Description
#### Before submitting - [ ] Lint code. If there are lint issues, please format the code first. ```shell # Install and register `pre-commit` in the project folder pip install...
## Motivation > :bulb: If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending...