[TRTLLM-4618][feat] Fix cutlass MoE GEMM fallback failure on FP8 + add e2e test for Mixtral 8x7B FP8 on RTX6000 Pro (SM120)
Description
- This PR adds tests to end-to-end tests for Mixtral 8x7B FP4 to the TensorRT-LLM(w torch backend) test suite to be run on SM120.
- Cutlass MoE GEMM did not support FP8 for SM120, thus to make this work, there has been a change in MoE GEMM for cutlass to use Ada (SM89) kernels for FP8 MoE GEMM.
- The tests will be used by QA as a part of the B40 Bring-up (RTX6000 Pro SM120) effort.
Test Coverage
Single node tests
-
test_e2e.py::test_ptp_quickstart_advanced[Llama3.1-8B-BF16-llama-3.1-model/Meta-Llama-3.1-8B]
These tests will be included in the SM120 verification plan for QA sign-off.
GitHub Bot Help
/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...
Provide a user friendly way for developers to interact with a Jenkins server.
Run /bot [-h|--help] to print this help message.
See details below for each supported subcommand.
run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]
Launch build/test pipelines. All previously running jobs will be killed.
--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.
--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.
--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.
--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.
--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.
--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.
--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.
--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.
--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".
kill
kill
Kill all running builds associated with pull request.
skip
skip --comment COMMENT
Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.
reuse-pipeline
reuse-pipeline
Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.
/bot run
PR_Github #5238 [ run ] triggered by Bot
PR_Github #5238 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3826 completed with status: 'SUCCESS'
/bot run --stage-list "RTXPro6000-PyTorch-[Post-Merge]-1"
PR_Github #5381 [ run ] triggered by Bot
PR_Github #5381 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3926 (Partly Tested) completed with status: 'FAILURE'
/bot run --stage-list "RTXPro6000-PyTorch-[Post-Merge]-1"
PR_Github #5507 [ run ] triggered by Bot
PR_Github #5507 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #4013 (Partly Tested) completed with status: 'FAILURE'
/bot run --stage-list "RTXPro6000-PyTorch-[Post-Merge]-1"
PR_Github #5529 [ run ] triggered by Bot
PR_Github #5529 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4030 (Partly Tested) completed with status: 'SUCCESS'
/bot run
PR_Github #5538 [ run ] triggered by Bot
PR_Github #5538 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4039 completed with status: 'FAILURE'
/bot run --stage-list
PR_Github #5572 Bot args parsing error: usage: /bot run [--reuse-test [optionalpipeline-id]] [--disable-fail-fast] [--skip-test] [--stage-list "A10-1, xxx"] [--gpu-type "A30, H100_PCIe"] [--test-backend "pytorch, cpp"] [--multi-gpu-test] [--add-multi-gpu-test] [--only-multi-gpu-test] [--disable-multi-gpu-test] [--post-merge] [--extra-stage "H100_PCIe-[Post-Merge]-1, xxx"] [--memory-profiling] [--disable-incremental-build] [--enable-publish-last-known-good] [--debug] /bot run: error: argument --stage-list: expected one argument
/bot run
PR_Github #5574 [ run ] triggered by Bot
PR_Github #5574 [ run ] completed with state FAILURE
/bot run
PR_Github #5617 [ run ] triggered by Bot
PR_Github #5617 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #4102 completed with status: 'FAILURE'
/bot run --disable-fail-fast
PR_Github #5630 [ run ] triggered by Bot
PR_Github #5630 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4114 completed with status: 'SUCCESS'
/bot run --stage-list "RTXPro6000-PyTorch-[Post-Merge]-1"
PR_Github #5640 [ run ] triggered by Bot
PR_Github #5640 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4122 (Partly Tested) completed with status: 'SUCCESS'
/bot reuse-pipeline