Sunny-bot1

Results 7 issues of Sunny-bot1

### PR Category Performance Optimization ### PR Types Bug fixes ### Description pcard-71500 fix fp8 data type bugs on cpu

Hi, The 45_dual_gemm example implements that the intermediate output and the final output are of the same type (D0, D1 and D2 must be the same type). To prevent loss...

question
? - Needs Triage
inactive-30d
inactive-90d

**What is your question?** when I use sm89 ``` int run_attention(Options& options) { using Attention = AttentionKernel< cutlass::half_t, // scalar_t cutlass::arch::Sm89, // ArchTag true, // Memory is aligned kQueriesPerBlock, kKeysPerBlock,...

question
inactive-30d

Hi, when I try to implement cuBLASLt FP8 batched gemm with bias based on LtFp8Matmul, I met this problem. ``` [2024-05-22 07:06:23][cublasLt][62029][Error][cublasLtMatmulAlgoGetHeuristic] Failed to query heuristics. cuBLAS API failed with...

cuBLASLt

### PR types ### PR changes ### Description

#### Before submitting - [ ] Lint code. If there are lint issues, please format the code first. ```shell # Install and register `pre-commit` in the project folder pip install...

## Motivation > :bulb: If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending...