Gerald
Gerald
Hi I see you publish the 2-GPU report in Readme. Can you Please give a script which uses 2 GPU? thanks!
Hello, here are my test logs ``` # command line: CUDA_VISIBLE_DEVICES=7 ./examples/55_hopper_mixeed_dtype_gemm/55_hopper_mixed_dtype_gemm --m=16 --n=6144 --k=2048 --g=128 --mode=1 # Running results: Running in group scale mode. Disposition: Passed Problem Size: 16x6144x2048x1...
# Purpose This PR is intended to support the [MiniMaxText01](https://huggingface.co/MiniMaxAI/MiniMax-Text-01) model inference. It can run on a single machine with 8xH800 and 8xH20, where a single H800 machine can handle...
**What is your question?** Great work! Could you please let me know if the word "half" [here](https://github.com/NVIDIA/cutlass/blob/main/examples/69_hopper_mixed_dtype_grouped_gemm/69_hopper_int4_bf16_grouped_gemm.cu#L123) is a typo or intentionally used with special consideration?
Hi, I tested this file `examples/55_hopper_mixed_dtype_gemm/55_hopper_int4_fp8_gemm.cu`, and it shows about a 10%-20% performance improvement under some input sizes. Is there a plan for supporting Scale with zero-point mode? Thank you.