Enable TEST04 and TEST05 for SDXL
We have already enabled TEST01 for SDXL - wasn't mandatory for v4.0 (because the proposal came late), but mandatory for v4.1. https://github.com/mlcommons/inference/pull/1574
NVIDIA has checked internally and SDXL can be enabled for TEST04 and TEST05 too. Can we enable them for v4.1?
@pgmpablo157321
I'm a bit concerned about enabling these tests for Edge systems.
Looking at the v4.0 Edge results, the SingleStream latency ranged from 2 to 13 seconds per sample. LoadGen seems to mandate at least 100 samples for Performance runs for Stable Diffusion XL. (Does anyone know why?) That would be at most 21.5 minutes per each Compliance run, or just over an hour for all three Compliance runs.
However, the Offline throughput ranged from 0.077 QPS to 1.26 QPS. LoadGen mandates 5,000 samples per run. That's up to 18 hours per each Compliance run, or over 2 days for all three Compliance runs.
For Datacenter, the Offline throughput ranged from 1.18 QPS to 13.71 QPS. That's up to 75 minutes per single Performance run.
Despite us having not decided on this issue, the submission checker already complains about missing TEST04 and TEST05, when the main results and TEST01 are present.
I've done a little digging in the submission checker:
- TEST04 is not excluded;
-
TEST05 appears to be excluded, but since there is no comma after
stable-diffusion-xl, it probably gets concatenated withmixtral-8x7bon the next line, resulting in incorrect behaviour.
Under the current rules (no TEST04 and TEST05 for stable-diffusion-xl), these bugs can be fixed as follows:
work_collection/mlperf_inference_git_master$ git diff
diff --git a/tools/submission/submission_checker.py b/tools/submission/submission_checker.py
index b15a859..0673cad 100755
--- a/tools/submission/submission_checker.py
+++ b/tools/submission/submission_checker.py
@@ -2539,6 +2539,7 @@ def check_compliance_dir(
"gptj-99.9",
"llama2-70b-99",
"llama2-70b-99.9",
+ "stable-diffusion-xl",
"mixtral-8x7b"
]:
test_list.remove("TEST04")
@@ -2548,7 +2549,7 @@ def check_compliance_dir(
"gptj-99.9",
"llama2-70b-99",
"llama2-70b-99.9",
- "stable-diffusion-xl"
+ "stable-diffusion-xl",
"mixtral-8x7b"
]:
test_list.remove("TEST05")
I agree with the concerns of the test time and disk space consumption are legit. However, this issue is not limited to SDXL but more like a general challenge for modern generative tasks on edge devices.
The purpose of compliance tests is to ensure validity of the submission results which is tangential to the test time and storage challenge as they are byproducts of the nature of the workload itself. In the future, if we plan to introduce more generative workloads such as text-to-video, the same problem still exists.
Definitely additional test cases are going to increase time for creating submission package in several fold. We could discuss and rethink the effectiveness of Test05 in the context of SDXL.
Possible solution:
- Do not use TEST05 for SDXL
- Reduce TEST04 to 500 samples for SDXL only in both Datacenter and Edge stable-diffusion-xl.Offline.min_query_count = 500
Do we know if this PR get merged in soon? Maybe in tomorrow's meeting? Otherwise, it will delay people completing their compliance runs in a timely manner.
[AMD Official Use Only - AMD Internal Distribution Only]
Andy,
The plan is to finalize this tomorrow
Miro Hodak Senior Member of Technical Staff, Solutions Architecture (AI/ML) MLPerf Inference co-Chair
From: Andy Ye @.> Sent: Monday, July 8, 2024 2:43 PM To: mlcommons/inference @.> Cc: Hodak, Miro @.>; Comment @.> Subject: Re: [mlcommons/inference] Enable TEST04 and TEST05 for SDXL (Issue #1727)
Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
Do we know if this PR get merged in soon? Maybe in tomorrow's meeting? Otherwise, it will delay people completing their compliance runs in a timely manner.
— Reply to this email directly, view it on GitHubhttps://github.com/mlcommons/inference/issues/1727#issuecomment-2214922062, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGDDSBCC4XV4KFPKUS6ALNLZLLMRNAVCNFSM6AAAAABJEUWQQWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJUHEZDEMBWGI. You are receiving this because you commented.Message ID: @.@.>>
In WG meeting, if I recall correctly, the consensus was to run TEST01 and TEST04 (500 samples). The below webpage still shows TEST05. https://github.com/mlcommons/inference/tree/master/compliance/nvidia#tests-required-for-each-benchmark Could you please clarify this? Thanks.
https://github.com/mlcommons/inference/blob/master/tools/submission/submission_checker.py#L2568 Submission checker does not check TEST05 so we are good. @pgmpablo157321 could you remove TEST05 as a requirement in the compliance directory's README?