inference Enable TEST04 and TEST05 for SDXL

We have already enabled TEST01 for SDXL - wasn't mandatory for v4.0 (because the proposal came late), but mandatory for v4.1. https://github.com/mlcommons/inference/pull/1574

NVIDIA has checked internally and SDXL can be enabled for TEST04 and TEST05 too. Can we enable them for v4.1?

@pgmpablo157321

Jun 11 '24 16:06 nv-ananjappa

I'm a bit concerned about enabling these tests for Edge systems.

Looking at the v4.0 Edge results, the SingleStream latency ranged from 2 to 13 seconds per sample. LoadGen seems to mandate at least 100 samples for Performance runs for Stable Diffusion XL. (Does anyone know why?) That would be at most 21.5 minutes per each Compliance run, or just over an hour for all three Compliance runs.

However, the Offline throughput ranged from 0.077 QPS to 1.26 QPS. LoadGen mandates 5,000 samples per run. That's up to 18 hours per each Compliance run, or over 2 days for all three Compliance runs.

Jun 11 '24 16:06 psyhtest

For Datacenter, the Offline throughput ranged from 1.18 QPS to 13.71 QPS. That's up to 75 minutes per single Performance run.

Jun 11 '24 17:06 psyhtest

Despite us having not decided on this issue, the submission checker already complains about missing TEST04 and TEST05, when the main results and TEST01 are present.

I've done a little digging in the submission checker:

TEST04 is not excluded;
TEST05 appears to be excluded, but since there is no comma after stable-diffusion-xl, it probably gets concatenated with mixtral-8x7b on the next line, resulting in incorrect behaviour.

Under the current rules (no TEST04 and TEST05 for stable-diffusion-xl), these bugs can be fixed as follows:

work_collection/mlperf_inference_git_master$ git diff
diff --git a/tools/submission/submission_checker.py b/tools/submission/submission_checker.py
index b15a859..0673cad 100755
--- a/tools/submission/submission_checker.py
+++ b/tools/submission/submission_checker.py
@@ -2539,6 +2539,7 @@ def check_compliance_dir(
         "gptj-99.9",
         "llama2-70b-99",
         "llama2-70b-99.9",
+        "stable-diffusion-xl",
         "mixtral-8x7b"
     ]:
         test_list.remove("TEST04")
@@ -2548,7 +2549,7 @@ def check_compliance_dir(
         "gptj-99.9",
         "llama2-70b-99",
         "llama2-70b-99.9",
-        "stable-diffusion-xl"
+        "stable-diffusion-xl",
         "mixtral-8x7b"
     ]:
         test_list.remove("TEST05")

Jun 23 '24 19:06 psyhtest

I agree with the concerns of the test time and disk space consumption are legit. However, this issue is not limited to SDXL but more like a general challenge for modern generative tasks on edge devices.

The purpose of compliance tests is to ensure validity of the submission results which is tangential to the test time and storage challenge as they are byproducts of the nature of the workload itself. In the future, if we plan to introduce more generative workloads such as text-to-video, the same problem still exists.

Jun 24 '24 16:06 nvyihengz

Definitely additional test cases are going to increase time for creating submission package in several fold. We could discuss and rethink the effectiveness of Test05 in the context of SDXL.

Jul 01 '24 08:07 kumarans-ai

Possible solution:

Do not use TEST05 for SDXL
Reduce TEST04 to 500 samples for SDXL only in both Datacenter and Edge stable-diffusion-xl.Offline.min_query_count = 500

Jul 02 '24 16:07 mrmhodak

Do we know if this PR get merged in soon? Maybe in tomorrow's meeting? Otherwise, it will delay people completing their compliance runs in a timely manner.

Jul 08 '24 18:07 yeandy

[AMD Official Use Only - AMD Internal Distribution Only]

Andy,

The plan is to finalize this tomorrow

Miro Hodak Senior Member of Technical Staff, Solutions Architecture (AI/ML) MLPerf Inference co-Chair

From: Andy Ye @.> Sent: Monday, July 8, 2024 2:43 PM To: mlcommons/inference @.> Cc: Hodak, Miro @.>; Comment @.> Subject: Re: [mlcommons/inference] Enable TEST04 and TEST05 for SDXL (Issue #1727)

Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.

Do we know if this PR get merged in soon? Maybe in tomorrow's meeting? Otherwise, it will delay people completing their compliance runs in a timely manner.

— Reply to this email directly, view it on GitHubhttps://github.com/mlcommons/inference/issues/1727#issuecomment-2214922062, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGDDSBCC4XV4KFPKUS6ALNLZLLMRNAVCNFSM6AAAAABJEUWQQWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJUHEZDEMBWGI. You are receiving this because you commented.Message ID: @.@.>>

Jul 08 '24 20:07 mrmhodak

In WG meeting, if I recall correctly, the consensus was to run TEST01 and TEST04 (500 samples). The below webpage still shows TEST05. https://github.com/mlcommons/inference/tree/master/compliance/nvidia#tests-required-for-each-benchmark Could you please clarify this? Thanks.

Jul 16 '24 17:07 hans-intel

https://github.com/mlcommons/inference/blob/master/tools/submission/submission_checker.py#L2568 Submission checker does not check TEST05 so we are good. @pgmpablo157321 could you remove TEST05 as a requirement in the compliance directory's README?

Jul 16 '24 18:07 nvyihengz