[Issue]: Slow model compilation
Problem Description
ONNX models take an exceedingly long amount of time to compile when using MIGraphX as the Execution Provider.
Operating System
Ubuntu 24.04.2 LTS (Noble Numbat) (WSL2 through Windows 11 24H2 26100.4652)
CPU
AMD Ryzen 5 5600X
GPU
AMD Radeon 9070 XT
ROCm Version
ROCm 6.4.2
Steps to Reproduce
Use ONNX Runtime with MIGraphX as the Execution Provider and give it a model that isn't cached.
I've tested with this SISR model and a 960x540 image as the input:
2025-07-23 16:08:18.772574132 [W:onnxruntime:Default, migraphx_execution_provider.cc:1298 compile_program] Model Compile: Begin
2025-07-23 16:43:41.864474252 [W:onnxruntime:Default, migraphx_execution_provider.cc:1303 compile_program] Model Compile: Complete
I've been told the corresponding operation takes less than a minute when using Nvidia's TensorRT as the Execution Provider on a similarish CPU.
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
/opt/rocm/bin/rocminfo --support WSL environment detected.
HSA System Attributes
Runtime Version: 1.1 Runtime Ext Version: 1.7 System Timestamp Freq.: 1000.000000MHz Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) Machine Model: LARGE System Endianness: LITTLE Mwaitx: DISABLED XNACK enabled: NO DMAbuf Support: YES VMM Support: YES
========== HSA Agents
Agent 1
Name: AMD Ryzen 5 5600X 6-Core Processor Uuid: CPU-XX Marketing Name: AMD Ryzen 5 5600X 6-Core Processor Vendor Name: CPU Feature: None specified Profile: FULL_PROFILE Float Round Mode: NEAR Max Queue Number: 0(0x0) Queue Min Size: 0(0x0) Queue Max Size: 0(0x0) Queue Type: MULTI Node: 0 Device Type: CPU Cache Info: L1: 32768(0x8000) KB Chip ID: 0(0x0) Cacheline Size: 64(0x40) Internal Node ID: 0 Compute Unit: 12 SIMDs per CU: 0 Shader Engines: 0 Shader Arrs. per Eng.: 0 Memory Properties: Features: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 16325832(0xf91cc8) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 2 Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED Size: 16325832(0xf91cc8) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 3 Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED Size: 16325832(0xf91cc8) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 4 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 16325832(0xf91cc8) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:4KB Alloc Alignment: 4KB Accessible by all: TRUE ISA Info:
Agent 2
Name: gfx1201 Marketing Name: AMD Radeon RX 9070 XT Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 1 Device Type: GPU Cache Info: L1: 32(0x20) KB L3: 65536(0x10000) KB Chip ID: 30032(0x7550) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 2400 Internal Node ID: 1 Compute Unit: 64 SIMDs per CU: 2 Shader Engines: 4 Shader Arrs. per Eng.: 2 Coherent Host Access: FALSE Memory Properties: Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 32(0x20) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 32(0x20) Max Work-item Per CU: 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Packet Processor uCode:: 1012 SDMA engine uCode:: 0 IOMMU Support:: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 16695296(0xfec000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Recommended Granule:2048KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Recommended Granule:0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx1201 Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32 ISA 2 Name: amdgcn-amd-amdhsa--gfx12-generic Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32 *** Done ***
Additional Information
No response
I can confirm using the same model on a native Linux system (Arch with linux-zen 6.15.7 kernel, rocm 6.4.1), running a Ryzen 9800x3d with a Radeon 7900 XTX, the compilation for the linked model for a 1920x1080 image takes approximately 50 minutes. A notable, possibly related observation is that although the source model is only 3.5 MB, the compiled output is 2 GB.
The problem becomes significantly worse when using a larger model, such as RIFE v4.25 heavy. The onnx model is 86.7 MB, and as such, I let MIGX's model compilation run for 4 hours before giving up and killing the process, which was utilizing 42 GB of RAM. As such, the current compilation method appears to not be scalable.
Hi @Artoriuz. Internal ticket has been created to investigate this issue. Thanks!
If we could get the log with env variables MIGRAPHX_TIME_MATCHERS=1 MIGRAPHX_TIME_PASSES=1 that would be helpful, Thanks!
If we could get the log with env variables
MIGRAPHX_TIME_MATCHERS=1 MIGRAPHX_TIME_PASSES=1that would be helpful, Thanks!
2025-07-24 12:38:31.888348807 [W:onnxruntime:Default, migraphx_execution_provider.cc:168 get_flags_from_env] [MIGraphX EP] MIGraphX ENV Override Variables Set:
2025-07-24 12:38:32.459799512 [W:onnxruntime:Default, migraphx_execution_provider.cc:1298 compile_program] Model Compile: Begin
split_single_dyn_dim: 0.012187ms
dead_code_elimination: 0.005583ms
simplify_dyn_ops: 1.1246ms
dead_code_elimination: 0.005152ms
normalize_ops: 0.543543ms
dead_code_elimination: 0.004876ms
eliminate_identity: 0.004022ms
dead_code_elimination: 0.004105ms
id: 0.000156ms
id: 9.1e-05ms
dead_code_elimination: 0.004721ms
dead_code_elimination: 0.00405ms
dead_code_elimination: 0.003959ms
dead_code_elimination: 0.004234ms
simplify_qdq: 4.6179ms
id: 6.4e-05ms
dead_code_elimination: 0.003894ms
eliminate_data_type: 0.013096ms
eliminate_data_type: 0.021086ms
simplify_reshapes: 7.90211ms
eliminate_identity: 0.004399ms
eliminate_pad: 0.00484ms
dead_code_elimination: 0.004564ms
insert_pad: 0.069962ms
dead_code_elimination: 0.004445ms
rewrite_rnn: 0.153783ms
dead_code_elimination: 0.004353ms
inline_module: 0.003307ms
rewrite_pooling: 0.003408ms
dead_code_elimination: 0.003949ms
rewrite_gelu: 334.98ms
simplify_reshapes: 6.08728ms
eliminate_convert: 0.034246ms
dead_code_elimination: 0.004482ms
simplify_algebra: 178.789ms
simplify_reshapes: 7.93819ms
eliminate_convert: 0.035091ms
dead_code_elimination: 0.006088ms
simplify_algebra: 21.3288ms
simplify_reshapes: 7.86051ms
eliminate_convert: 0.035119ms
dead_code_elimination: 0.00608ms
simplify_algebra: 20.7469ms
simplify_reshapes: 7.88351ms
eliminate_convert: 0.035716ms
dead_code_elimination: 0.005988ms
simplify_algebra: 21.5964ms
eliminate_common_subexpression: 0.137739ms
dead_code_elimination: 0.00833ms
propagate_constant: 1.9325e+06ms
dead_code_elimination: 0.118719ms
simplify_reshapes: 4.86823ms
eliminate_convert: 0.025659ms
dead_code_elimination: 0.004207ms
simplify_algebra: 15.0816ms
eliminate_common_subexpression: 0.10982ms
dead_code_elimination: 0.004077ms
propagate_constant: 0.165226ms
dead_code_elimination: 0.003922ms
optimize_module: 1.93279e+06ms
dead_code_elimination: 0.013573ms
eliminate_contiguous: 0.083232ms
dead_code_elimination: 0.012967ms
dead_code_elimination: 0.157796ms
layout_convolution: 0.820066ms
dead_code_elimination: 0.005006ms
dead_code_elimination: 0.008918ms
gpu::prefuse_ops: 65.2996ms
dead_code_elimination: 0.004803ms
eliminate_data_type: 0.033806ms
eliminate_data_type: 0.013766ms
eliminate_data_type: 0.010809ms
dead_code_elimination: 0.00372ms
rewrite_reduce: 2.11528ms
rewrite_low_precision: 0.531466ms
dead_code_elimination: 0.003784ms
simplify_reshapes: 4.82931ms
eliminate_convert: 0.025356ms
dead_code_elimination: 0.003839ms
simplify_algebra: 15.0259ms
eliminate_common_subexpression: 0.112924ms
dead_code_elimination: 0.004114ms
propagate_constant: 0.157291ms
dead_code_elimination: 0.003977ms
optimize_module: 20.2716ms
eliminate_identity: 0.003425ms
dead_code_elimination: 0.005786ms
dead_code_elimination: 0.007852ms
dead_code_elimination: 0.006952ms
fuse_pointwise: 0.646823ms
dead_code_elimination: 0.003169ms
dead_code_elimination: 0.00304ms
dead_code_elimination: 0.002774ms
dead_code_elimination: 0.002874ms
dead_code_elimination: 0.002737ms
fuse_reduce: 1.79423ms
eliminate_identity: 0.002011ms
dead_code_elimination: 0.002829ms
simplify_reshapes: 3.54099ms
eliminate_common_subexpression: 0.056343ms
dead_code_elimination: 0.003389ms
rewrite_reshapes: 4.98307ms
dead_code_elimination: 0.003637ms
simplify_reshapes: 5.22831ms
eliminate_common_subexpression: 0.103464ms
dead_code_elimination: 0.005391ms
rewrite_reshapes: 6.63407ms
fuse_pointwise: 11.7584ms
dead_code_elimination: 0.004482ms
simplify_reshapes: 4.88764ms
eliminate_common_subexpression: 0.055516ms
dead_code_elimination: 0.00337ms
rewrite_reshapes: 11.6166ms
dead_code_elimination: 0.002828ms
simplify_reshapes: 3.21346ms
eliminate_common_subexpression: 0.052724ms
dead_code_elimination: 0.003316ms
rewrite_reshapes: 6.67291ms
dead_code_elimination: 0.002756ms
simplify_reshapes: 3.62945ms
eliminate_common_subexpression: 0.054662ms
dead_code_elimination: 0.003435ms
rewrite_reshapes: 7.07297ms
dead_code_elimination: 0.002305ms
simplify_reshapes: 3.24776ms
eliminate_common_subexpression: 0.053027ms
dead_code_elimination: 0.003517ms
rewrite_reshapes: 6.72302ms
dead_code_elimination: 0.002443ms
fuse_reduce: 33.8908ms
split_reduce: 0.001837ms
eliminate_identity: 0.001267ms
dead_code_elimination: 0.002837ms
simplify_reshapes: 3.29937ms
eliminate_common_subexpression: 0.054322ms
dead_code_elimination: 0.005125ms
rewrite_reshapes: 4.62302ms
dead_code_elimination: 0.005235ms
fuse_pointwise: 4.8245ms
fuse_pointwise_reduce: 52.9695ms
dead_code_elimination: 0.004932ms
id: 9.1e-05ms
dead_code_elimination: 0.002957ms
dead_code_elimination: 0.002948ms
dead_code_elimination: 0.01755ms
gpu::fuse_mlir: 1.97339ms
dead_code_elimination: 0.002351ms
dead_code_elimination: 0.002113ms
fuse_concat: 0.072323ms
dead_code_elimination: 0.002121ms
auto_contiguous: 0.099396ms
dead_code_elimination: 0.002507ms
gpu::lowering: 1.20561ms
eliminate_contiguous: 55.0906ms
dead_code_elimination: 0.032125ms
eliminate_concat: 0.357974ms
dead_code_elimination: 0.003379ms
gpu::compile_miopen: 0.039968ms
dead_code_elimination: 0.002985ms
dead_code_elimination: 0.004629ms
dead_code_elimination: 0.003839ms
gpu::fuse_ops: 4.34631ms
dead_code_elimination: 0.00316ms
gpu::compile_hipblaslt: 0.021637ms
dead_code_elimination: 0.002938ms
replace_allocate: 0.84603ms
dead_code_elimination: 0.003958ms
adjust_allocation: 0.018698ms
dead_code_elimination: 0.002902ms
dead_code_elimination: 0.000625ms
dead_code_elimination: 0.000533ms
dead_code_elimination: dead_code_elimination: 0.00056ms
0.000634ms
dead_code_elimination: 0.000541ms
dead_code_elimination: 0.000615ms
dead_code_elimination: 0.000459ms
dead_code_elimination: 0.000533ms
dead_code_elimination: 0.00044ms
dead_code_elimination: dead_code_elimination: dead_code_elimination: dead_code_eliminationdead_code_eliminationdead_code_eliminationdead_code_eliminationdead_code_elimination: : : 0.0006340.001276ms
0.000432ms
: 0.000771ms
0.00078ms
0.000744ms
ms
dead_code_elimination: 0.000671ms
: 0.000423ms
0.000735ms
dead_code_elimination: 0.017981ms
dead_code_elimination: 0.000624ms
dead_code_elimination: 0.000634ms
dead_code_elimination: 0.000697ms
dead_code_elimination: dead_code_elimination: 0.000689ms
0.000679ms
dead_code_elimination: 0.001001ms
dead_code_elimination: 0.000772ms
dead_code_elimination: 0.000762ms
dead_code_elimination: 0.000551ms
dead_code_elimination: 0.000743ms
dead_code_elimination: 0.00068ms
dead_code_elimination: 0.000698ms
dead_code_elimination: 0.000781ms
dead_code_elimination: 0.000817ms
dead_code_elimination: dead_code_elimination: 0.000579ms
dead_code_elimination: 0.000606ms
dead_code_elimination: 0.000587ms
dead_code_elimination: 0.00056ms
0.000753ms
dead_code_elimination: dead_code_elimination: 0.001020.001175ms
ms
dead_code_elimination: 0.000909ms
dead_code_elimination: 0.001203ms
dead_code_elimination: 0.00134ms
dead_code_elimination: 0.000844ms
dead_code_elimination: 0.00136ms
dead_code_elimination: 0.001415ms
dead_code_elimination: 0.001267ms
dead_code_elimination: 0.001368ms
dead_code_elimination: 0.001286ms
dead_code_elimination: 0.001176ms
dead_code_elimination: 0.000772ms
dead_code_elimination: 0.00134ms
dead_code_elimination: 0.001441ms
dead_code_elimination: 0.001579ms
dead_code_elimination: 0.001552ms
dead_code_elimination: 0.001295ms
dead_code_elimination: 0.00123ms
dead_code_elimination: 0.001543ms
dead_code_elimination: 0.001377ms
dead_code_elimination: 0.000671ms
dead_code_elimination: 0.001305ms
rewrite_quantization: 0.001341ms
simplify_reshapes: 0.777747ms
eliminate_convert: 0.004923ms
dead_code_elimination: 0.00044ms
simplify_algebra: 1.35177ms
eliminate_common_subexpression: 0.005125ms
dead_code_elimination: 0.000386ms
propagate_constant: 0.052504ms
dead_code_elimination: 0.000744ms
optimize_module: 2.27441ms
dead_code_elimination: 0.001469ms
dead_code_elimination: 0.00124ms
dead_code_elimination: 0.00146ms
dead_code_elimination: 0.00147ms
dead_code_elimination: 0.001304ms
dead_code_elimination: 0.001524ms
dead_code_elimination: 0.001277ms
dead_code_elimination: 0.001239ms
dead_code_elimination: 0.001442ms
dead_code_elimination: 0.00135ms
dead_code_elimination: 0.001102ms
dead_code_elimination: 0.001543ms
dead_code_elimination: 0.001699ms
dead_code_elimination: 0.00124ms
dead_code_elimination: 0.001102ms
dead_code_elimination: 0.001313ms
dead_code_elimination: 0.001277ms
dead_code_elimination: 0.001368ms
dead_code_elimination: 0.001378ms
dead_code_elimination: 0.001203ms
dead_code_elimination: 0.001322ms
dead_code_elimination: 0.001286ms
dead_code_elimination: 0.001662ms
dead_code_elimination: 0.001276ms
dead_code_elimination: 0.001479ms
dead_code_elimination: 0.001212ms
dead_code_elimination: 0.001277ms
dead_code_elimination: 0.000873ms
dead_code_elimination: 0.001258ms
dead_code_elimination: 0.001487ms
dead_code_elimination: 0.001405ms
dead_code_elimination: 0.001507ms
dead_code_elimination: 0.001267ms
dead_code_elimination: 0.001037ms
dead_code_elimination: 0.001451ms
dead_code_elimination: 0.001487ms
dead_code_elimination: 0.001478ms
dead_code_elimination: 0.000772ms
dead_code_elimination: 0.00124ms
dead_code_elimination: 0.001304ms
dead_code_elimination: 0.001451ms
dead_code_elimination: 0.001396ms
dead_code_elimination: 0.001451ms
dead_code_elimination: 0.001323ms
dead_code_elimination: 0.001699ms
dead_code_elimination: 0.001313ms
dead_code_elimination: 0.001258ms
dead_code_elimination: 0.001038ms
dead_code_elimination: 0.001138ms
dead_code_elimination: 0.000789ms
dead_code_elimination: 0.001157ms
dead_code_elimination: 0.000946ms
dead_code_elimination: 0.001056ms
dead_code_elimination: 0.000588ms
dead_code_elimination: 0.001671ms
dead_code_elimination: 0.000827ms
dead_code_elimination: 0.001267ms
dead_code_elimination: 0.000606ms
dead_code_elimination: 0.001157ms
dead_code_elimination: 0.000588ms
dead_code_elimination: 0.001148ms
dead_code_elimination: 0.000625ms
dead_code_elimination: 0.001222ms
dead_code_elimination: 0.000771ms
dead_code_elimination: 0.000652ms
dead_code_elimination: 0.000799ms
dead_code_elimination: 0.0009ms
dead_code_elimination: 0.000744ms
dead_code_elimination: 0.000707ms
dead_code_elimination: 0.000716ms
dead_code_elimination: 0.000707ms
dead_code_elimination: 0.000964ms
dead_code_elimination: 0.000909ms
dead_code_elimination: 0.000698ms
dead_code_elimination: 0.000671ms
dead_code_elimination: 0.000661ms
dead_code_elimination: dead_code_elimination: 0.000579ms
dead_code_elimination: 0.000515ms
dead_code_elimination: 0.000882ms
dead_code_elimination: 0.000771ms
dead_code_elimination: 0.000726ms
dead_code_elimination: 0.000836ms
dead_code_elimination: 0.000771ms
0.000716ms
dead_code_elimination: 0.001185ms
dead_code_elimination: 0.000634ms
dead_code_elimination: 0.001626ms
dead_code_elimination: 0.001478ms
dead_code_elimination: 0.000726ms
dead_code_elimination: 0.001543ms
dead_code_elimination: 0.00157ms
dead_code_elimination: 0.001534ms
dead_code_elimination: 0.001332ms
dead_code_elimination: 0.001478ms
dead_code_elimination: 0.001524ms
dead_code_elimination: 0.001075ms
dead_code_elimination: 0.001369ms
dead_code_elimination: 0.001386ms
dead_code_elimination: 0.001276ms
dead_code_elimination: 0.001377ms
dead_code_elimination: 0.001432ms
dead_code_elimination: 0.001396ms
dead_code_elimination: 0.001084ms
dead_code_elimination: 0.000698ms
dead_code_elimination: 0.001663ms
dead_code_elimination: 0.000744ms
dead_code_elimination: 0.001405ms
dead_code_elimination: 0.000845ms
dead_code_elimination: 0.001359ms
dead_code_elimination: 0.00089ms
gpu::compile_ops: 28979.6ms
dead_code_elimination: 0.023961ms
promote_literals: 0.573601ms
dead_code_elimination: 0.003462ms
gpu::write_literals: 0.03455ms
schedule: 0.191252ms
memory_coloring: 0.327476ms
sync_device: 0.00067ms
preallocate_param: 0.019267ms
dead_code_elimination: 0.003829ms
eliminate_allocation: 0.014364ms
check_context: 0.000919ms
normalize_ops: 74.4609ms
dead_code_elimination: 0.009891ms
eliminate_identity: 0.003757ms
2025-07-24 13:14:14.918559452 [W:onnxruntime:Default, migraphx_execution_provider.cc:1303 compile_program] Model Compile: Complete
Compile time seems to scale a lot with input shape. The higher the resolution the longer it takes to compile. With a 256x256 input this same model compiles in roughly 4 minutes (down from 35 minutes with a 960x540 input).
Would it be possible for us to get an update on this? It seems like MIGraphX is still taking a very long time to compile relatively simple models in the latest release.
Hello @Artoriuz,I will reproduce this issue and get back to you as soon as possible.
For me this appears to be resolved in the latest release.
For compiling RIFE 4.25-heavy at 1280x768 (Ryzen 9800x3d + Radeon 7900 XTX): Rocm 6.4.2 + MIGX 2.12.0: More than 7 hours (I left it running overnight and it did not finish) Rocm 7.0.2 + MIGX 2.13.0: 1.5 minutes
ArtCNN R8F64 @ 1600x900 still takes around 30 minutes to compile on my 7950X + 7900XTX
ROCm 7.0.2 / MIGraphX 2.13.0