inference icon indicating copy to clipboard operation
inference copied to clipboard

Enable TEST04 and TEST05 for SDXL

Open nv-ananjappa opened this issue 1 year ago • 10 comments

We have already enabled TEST01 for SDXL - wasn't mandatory for v4.0 (because the proposal came late), but mandatory for v4.1. https://github.com/mlcommons/inference/pull/1574

NVIDIA has checked internally and SDXL can be enabled for TEST04 and TEST05 too. Can we enable them for v4.1?

@pgmpablo157321

nv-ananjappa avatar Jun 11 '24 16:06 nv-ananjappa

I'm a bit concerned about enabling these tests for Edge systems.

Looking at the v4.0 Edge results, the SingleStream latency ranged from 2 to 13 seconds per sample. LoadGen seems to mandate at least 100 samples for Performance runs for Stable Diffusion XL. (Does anyone know why?) That would be at most 21.5 minutes per each Compliance run, or just over an hour for all three Compliance runs.

However, the Offline throughput ranged from 0.077 QPS to 1.26 QPS. LoadGen mandates 5,000 samples per run. That's up to 18 hours per each Compliance run, or over 2 days for all three Compliance runs.

image

psyhtest avatar Jun 11 '24 16:06 psyhtest

For Datacenter, the Offline throughput ranged from 1.18 QPS to 13.71 QPS. That's up to 75 minutes per single Performance run.

psyhtest avatar Jun 11 '24 17:06 psyhtest

Despite us having not decided on this issue, the submission checker already complains about missing TEST04 and TEST05, when the main results and TEST01 are present.

I've done a little digging in the submission checker:

Under the current rules (no TEST04 and TEST05 for stable-diffusion-xl), these bugs can be fixed as follows:

work_collection/mlperf_inference_git_master$ git diff
diff --git a/tools/submission/submission_checker.py b/tools/submission/submission_checker.py
index b15a859..0673cad 100755
--- a/tools/submission/submission_checker.py
+++ b/tools/submission/submission_checker.py
@@ -2539,6 +2539,7 @@ def check_compliance_dir(
         "gptj-99.9",
         "llama2-70b-99",
         "llama2-70b-99.9",
+        "stable-diffusion-xl",
         "mixtral-8x7b"
     ]:
         test_list.remove("TEST04")
@@ -2548,7 +2549,7 @@ def check_compliance_dir(
         "gptj-99.9",
         "llama2-70b-99",
         "llama2-70b-99.9",
-        "stable-diffusion-xl"
+        "stable-diffusion-xl",
         "mixtral-8x7b"
     ]:
         test_list.remove("TEST05")

psyhtest avatar Jun 23 '24 19:06 psyhtest

I agree with the concerns of the test time and disk space consumption are legit. However, this issue is not limited to SDXL but more like a general challenge for modern generative tasks on edge devices.

The purpose of compliance tests is to ensure validity of the submission results which is tangential to the test time and storage challenge as they are byproducts of the nature of the workload itself. In the future, if we plan to introduce more generative workloads such as text-to-video, the same problem still exists.

nvyihengz avatar Jun 24 '24 16:06 nvyihengz

Definitely additional test cases are going to increase time for creating submission package in several fold. We could discuss and rethink the effectiveness of Test05 in the context of SDXL.

kumarans-ai avatar Jul 01 '24 08:07 kumarans-ai

Possible solution:

  1. Do not use TEST05 for SDXL
  2. Reduce TEST04 to 500 samples for SDXL only in both Datacenter and Edge stable-diffusion-xl.Offline.min_query_count = 500

mrmhodak avatar Jul 02 '24 16:07 mrmhodak

Do we know if this PR get merged in soon? Maybe in tomorrow's meeting? Otherwise, it will delay people completing their compliance runs in a timely manner.

yeandy avatar Jul 08 '24 18:07 yeandy

[AMD Official Use Only - AMD Internal Distribution Only]

Andy,

The plan is to finalize this tomorrow

Miro Hodak Senior Member of Technical Staff, Solutions Architecture (AI/ML) MLPerf Inference co-Chair

From: Andy Ye @.> Sent: Monday, July 8, 2024 2:43 PM To: mlcommons/inference @.> Cc: Hodak, Miro @.>; Comment @.> Subject: Re: [mlcommons/inference] Enable TEST04 and TEST05 for SDXL (Issue #1727)

Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.

Do we know if this PR get merged in soon? Maybe in tomorrow's meeting? Otherwise, it will delay people completing their compliance runs in a timely manner.

— Reply to this email directly, view it on GitHubhttps://github.com/mlcommons/inference/issues/1727#issuecomment-2214922062, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGDDSBCC4XV4KFPKUS6ALNLZLLMRNAVCNFSM6AAAAABJEUWQQWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJUHEZDEMBWGI. You are receiving this because you commented.Message ID: @.@.>>

mrmhodak avatar Jul 08 '24 20:07 mrmhodak

In WG meeting, if I recall correctly, the consensus was to run TEST01 and TEST04 (500 samples). The below webpage still shows TEST05. https://github.com/mlcommons/inference/tree/master/compliance/nvidia#tests-required-for-each-benchmark Could you please clarify this? Thanks.

hans-intel avatar Jul 16 '24 17:07 hans-intel

https://github.com/mlcommons/inference/blob/master/tools/submission/submission_checker.py#L2568 Submission checker does not check TEST05 so we are good. @pgmpablo157321 could you remove TEST05 as a requirement in the compliance directory's README?

nvyihengz avatar Jul 16 '24 18:07 nvyihengz