TensorRT-LLM [TRTLLM-4618][feat] Fix cutlass MoE GEMM fallback failure on FP8 + add e2e test for Mixtral 8x7B FP8 on RTX6000 Pro (SM120)

Description

This PR adds tests to end-to-end tests for Mixtral 8x7B FP4 to the TensorRT-LLM(w torch backend) test suite to be run on SM120.
Cutlass MoE GEMM did not support FP8 for SM120, thus to make this work, there has been a change in MoE GEMM for cutlass to use Ada (SM89) kernels for FP8 MoE GEMM.
The tests will be used by QA as a part of the B40 Bring-up (RTX6000 Pro SM120) effort.

Test Coverage

Single node tests

test_e2e.py::test_ptp_quickstart_advanced[Llama3.1-8B-BF16-llama-3.1-model/Meta-Llama-3.1-8B]

These tests will be included in the SM120 verification plan for QA sign-off.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

May 15 '25 00:05 farazkh80

/bot run

May 15 '25 02:05 farazkh80

PR_Github #5238 [ run ] triggered by Bot

May 15 '25 02:05 tensorrt-cicd

PR_Github #5238 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #3826 completed with status: 'SUCCESS'

May 15 '25 06:05 tensorrt-cicd

/bot run --stage-list "RTXPro6000-PyTorch-[Post-Merge]-1"

May 15 '25 17:05 farazkh80

PR_Github #5381 [ run ] triggered by Bot

May 15 '25 17:05 tensorrt-cicd

PR_Github #5381 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #3926 (Partly Tested) completed with status: 'FAILURE'

May 15 '25 19:05 tensorrt-cicd

/bot run --stage-list "RTXPro6000-PyTorch-[Post-Merge]-1"

May 16 '25 11:05 farazkh80

PR_Github #5507 [ run ] triggered by Bot

May 16 '25 11:05 tensorrt-cicd

PR_Github #5507 [ run ] completed with state FAILURE /LLM/main/L0_MergeRequest_PR pipeline #4013 (Partly Tested) completed with status: 'FAILURE'

May 16 '25 14:05 tensorrt-cicd

/bot run --stage-list "RTXPro6000-PyTorch-[Post-Merge]-1"

May 16 '25 18:05 farazkh80

PR_Github #5529 [ run ] triggered by Bot

May 16 '25 18:05 tensorrt-cicd

PR_Github #5529 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #4030 (Partly Tested) completed with status: 'SUCCESS'

May 16 '25 20:05 tensorrt-cicd

/bot run

May 16 '25 20:05 farazkh80

PR_Github #5538 [ run ] triggered by Bot

May 16 '25 20:05 tensorrt-cicd

PR_Github #5538 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #4039 completed with status: 'FAILURE'

May 16 '25 22:05 tensorrt-cicd

/bot run --stage-list

May 17 '25 14:05 farazkh80

PR_Github #5572 Bot args parsing error: usage: /bot run [--reuse-test [optionalpipeline-id]] [--disable-fail-fast] [--skip-test] [--stage-list "A10-1, xxx"] [--gpu-type "A30, H100_PCIe"] [--test-backend "pytorch, cpp"] [--multi-gpu-test] [--add-multi-gpu-test] [--only-multi-gpu-test] [--disable-multi-gpu-test] [--post-merge] [--extra-stage "H100_PCIe-[Post-Merge]-1, xxx"] [--memory-profiling] [--disable-incremental-build] [--enable-publish-last-known-good] [--debug] /bot run: error: argument --stage-list: expected one argument

May 17 '25 14:05 tensorrt-cicd

/bot run

May 17 '25 14:05 farazkh80

PR_Github #5574 [ run ] triggered by Bot

May 17 '25 14:05 tensorrt-cicd

PR_Github #5574 [ run ] completed with state FAILURE

May 17 '25 14:05 tensorrt-cicd

/bot run

May 18 '25 14:05 farazkh80

PR_Github #5617 [ run ] triggered by Bot

May 18 '25 14:05 tensorrt-cicd

PR_Github #5617 [ run ] completed with state FAILURE /LLM/main/L0_MergeRequest_PR pipeline #4102 completed with status: 'FAILURE'

May 18 '25 16:05 tensorrt-cicd

/bot run --disable-fail-fast

May 18 '25 21:05 farazkh80

PR_Github #5630 [ run ] triggered by Bot

May 18 '25 21:05 tensorrt-cicd

PR_Github #5630 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #4114 completed with status: 'SUCCESS'

May 19 '25 00:05 tensorrt-cicd

/bot run --stage-list "RTXPro6000-PyTorch-[Post-Merge]-1"

May 19 '25 00:05 farazkh80

PR_Github #5640 [ run ] triggered by Bot

May 19 '25 00:05 tensorrt-cicd

PR_Github #5640 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #4122 (Partly Tested) completed with status: 'SUCCESS'

May 19 '25 01:05 tensorrt-cicd

/bot reuse-pipeline

May 19 '25 01:05 farazkh80