Gerald issues

Results 5 issues of


                                            Gerald

How to inference with 2-GPU

Hi I see you publish the 2-GPU report in Readme. Can you Please give a script which uses 2 GPU? thanks!

[QST] Why hopper-mixed-gemm's Bandwidth Utilization only have ~9% MBU in H100 SXM5?

Hello, here are my test logs ``` # command line: CUDA_VISIBLE_DEVICES=7 ./examples/55_hopper_mixeed_dtype_gemm/55_hopper_mixed_dtype_gemm --m=16 --n=6144 --k=2048 --g=128 --mode=1 # Running results: Running in group scale mode. Disposition: Passed Problem Size: 16x6144x2048x1...

inactive-30d

[Model][MiniMaxText01] Support MiniMaxText01 model inference

# Purpose This PR is intended to support the [MiniMaxText01](https://huggingface.co/MiniMaxAI/MiniMax-Text-01) model inference. It can run on a single machine with 8xH800 and 8xH20, where a single H800 machine can handle...

needs-rebase

[QST] Question about example 69

**What is your question?** Great work! Could you please let me know if the word "half" [here](https://github.com/NVIDIA/cutlass/blob/main/examples/69_hopper_mixed_dtype_grouped_gemm/69_hopper_int4_bf16_grouped_gemm.cu#L123) is a typo or intentionally used with special consideration?

question

? - Needs Triage

inactive-30d

inactive-90d

[QST] Can hopper_int4_fp8_gemm support Scale with zero-point mode?

Hi, I tested this file `examples/55_hopper_mixed_dtype_gemm/55_hopper_int4_fp8_gemm.cu`, and it shows about a 10%-20% performance improvement under some input sizes. Is there a plan for supporting Scale with zero-point mode? Thank you.

question

? - Needs Triage

inactive-30d