[TRTLLM-4932] Add QA accuracy tests for NIM-prioritized models
New tests added:
- Llama-3.2-1B: added mmlu benchmark
- Llama-3.1-Nemotron-Nano-8B-v1: added GSM8K, GPQADiamond benchmarks
- Llama-3_1-Nemotron-Ultra-253B-v1: added the entire model (FP8 variant is being added to
ftp/llm-models) - Phi-4-mini-instruct: added the model to the tests; skipped the test as the model likely has to be added to Torch models first (given the current error)
/bot run
@syuoni @crazydemo @LarryXFly - can you review this PR? Feel free to unassign yourself and tag someone else instead
/bot run
/bot run
PR_Github #5121 [ run ] triggered by Bot
PR_Github #5121 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3735 completed with status: 'FAILURE'
/bot run
PR_Github #5205 [ run ] triggered by Bot
PR_Github #5205 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #3798 completed with status: 'FAILURE'
/bot run
PR_Github #5211 [ run ] triggered by Bot
PR_Github #5211 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #3803 completed with status: 'SUCCESS'
/bot run
PR_Github #5627 [ run ] triggered by Bot
PR_Github #5627 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4111 completed with status: 'FAILURE'
/bot run
PR_Github #5747 [ run ] triggered by Bot
PR_Github #5747 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #4203 completed with status: 'FAILURE'
/bot run
PR_Github #5764 [ run ] triggered by Bot
PR_Github #5764 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #4218 completed with status: 'FAILURE'
/bot run
PR_Github #5780 [ run ] triggered by Bot
@syuoni @Tracin @chang-l @tijyojwad Could you review a stacked PR (can't tag you there directly) that fills the remaining gaps? https://github.com/moraxu/TensorRT-LLM/pull/1
I figured it would be easier to merge it to this branch, given the existing acc references here.
/bot kill
PR_Github #5781 [ kill ] triggered by Bot
PR_Github #5780 [ run ] completed with state ABORTED
PR_Github #5781 [ kill ] completed with state SUCCESS
Successfully killed previous jobs for commit 2b33c83
/bot run
/bot run