AMDMIGraphX Exhaustive tune reduce operators

This will tune the block size and algorithm chosen. It also fixes the benchmarking.

Jan 09 '25 22:01 pfultz2

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Additional details and impacted files

@@           Coverage Diff            @@
##           develop    #3751   +/-   ##
========================================
  Coverage    92.11%   92.11%           
========================================
  Files          525      526    +1     
  Lines        24119    24119           
========================================
  Hits         22216    22216           
  Misses        1903     1903

Files with missing lines	Coverage Δ
src/include/migraphx/bit.hpp	`100.00% <100.00%> (ø)`
src/include/migraphx/generic_float.hpp	`99.14% <ø> (-0.02%)`	:arrow_down:

:rocket: New features to boost your workflow:

:snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Jan 09 '25 23:01 codecov[bot]

@turneram Why did you close this?

Mar 07 '25 16:03 pfultz2

/AzurePipelines run

Mar 07 '25 22:03 jayhawk-commits

Azure Pipelines successfully started running 1 pipeline(s).

Mar 07 '25 22:03 azure-pipelines[bot]

Test Batch Rate new
073c3e Rate old
b4ba1c Diff Compare

torchvision-resnet50 64 3,258.31 3,235.95 0.69% :white_check_mark:

torchvision-resnet50_fp16 64 6,934.90 6,908.41 0.38% :white_check_mark:

torchvision-densenet121 32 2,457.17 2,445.21 0.49% :white_check_mark:

torchvision-densenet121_fp16 32 4,220.96 4,219.44 0.04% :white_check_mark:

torchvision-inceptionv3 32 1,627.71 1,618.24 0.59% :white_check_mark:

torchvision-inceptionv3_fp16 32 2,721.76 2,714.76 0.26% :white_check_mark:

cadene-inceptionv4 16 761.13 756.47 0.62% :white_check_mark:

cadene-resnext64x4 16 818.72 814.62 0.50% :white_check_mark:

slim-mobilenet 64 7,475.60 7,433.86 0.56% :white_check_mark:

slim-nasnetalarge 64 217.81 216.86 0.44% :white_check_mark:

slim-resnet50v2 64 3,460.26 3,440.36 0.58% :white_check_mark:

bert-mrpc-onnx 8 1,150.66 1,138.32 1.08% :white_check_mark:

bert-mrpc-tf 1 457.66 454.52 0.69% :white_check_mark:

pytorch-examples-wlang-gru 1 485.95 475.40 2.22% :white_check_mark:

pytorch-examples-wlang-lstm 1 448.87 443.05 1.31% :white_check_mark:

torchvision-resnet50_1 1 810.86 818.79 -0.97% :white_check_mark:

cadene-dpn92_1 1 431.87 425.66 1.46% :white_check_mark:

cadene-resnext101_1 1 393.54 391.85 0.43% :white_check_mark:

onnx-taau-downsample 1 396.86 394.70 0.55% :white_check_mark:

dlrm-criteoterabyte 1 32.34 32.18 0.49% :white_check_mark:

dlrm-criteoterabyte_fp16 1 51.21 51.12 0.17% :white_check_mark:

agentmodel 1 10,462.72 9,553.93 9.51% :high_brightness:

unet_fp16 2 58.61 58.49 0.21% :white_check_mark:

resnet50v1_fp16 1 1,080.35 1,088.59 -0.76% :white_check_mark:

resnet50v1_int8 1 1,049.93 1,077.39 -2.55% :white_check_mark:

bert_base_cased_fp16 64 1,170.94 1,163.40 0.65% :white_check_mark:

bert_large_uncased_fp16 32 356.30 354.51 0.50% :white_check_mark:

bert_large_fp16 1 196.31 193.67 1.36% :white_check_mark:

distilgpt2_fp16 16 2,232.85 2,217.48 0.69% :white_check_mark:

yolov5s 1 540.21 547.68 -1.36% :white_check_mark:

tinyllama 1 43.89 43.63 0.61% :white_check_mark:

vicuna-fastchat 1 45.03 43.89 2.60% :white_check_mark:

whisper-tiny-encoder 1 421.89 420.34 0.37% :white_check_mark:

whisper-tiny-decoder 1 413.78 412.38 0.34% :white_check_mark:

llama2_7b 1 nan nan nan% :x:

qwen1.5-7b 1 23.54 23.38 0.65% :white_check_mark:

phi3-3.8b 1 nan nan nan% :x:

mask-rcnn 1 18.72 18.31 2.26% :white_check_mark:

llama3-8b 1 21.73 21.64 0.42% :white_check_mark:

whisper-large-encoder 1 10.21 10.17 0.43% :white_check_mark:

whisper-large-decoder 1 99.51 97.86 1.69% :white_check_mark:

mistral-7b 1 23.78 23.65 0.52% :white_check_mark:

FLUX.1-schnell 1 879.22 895.71 -1.84% :white_check_mark:

nan nan nan nan nan% :x:

Test	Batch	Rate new 073c3e	Rate old b4ba1c	Diff	Compare
torchvision-resnet50	64	3,258.31	3,235.95	0.69%	:white_check_mark:
torchvision-resnet50_fp16	64	6,934.90	6,908.41	0.38%	:white_check_mark:
torchvision-densenet121	32	2,457.17	2,445.21	0.49%	:white_check_mark:
torchvision-densenet121_fp16	32	4,220.96	4,219.44	0.04%	:white_check_mark:
torchvision-inceptionv3	32	1,627.71	1,618.24	0.59%	:white_check_mark:
torchvision-inceptionv3_fp16	32	2,721.76	2,714.76	0.26%	:white_check_mark:
cadene-inceptionv4	16	761.13	756.47	0.62%	:white_check_mark:
cadene-resnext64x4	16	818.72	814.62	0.50%	:white_check_mark:
slim-mobilenet	64	7,475.60	7,433.86	0.56%	:white_check_mark:
slim-nasnetalarge	64	217.81	216.86	0.44%	:white_check_mark:
slim-resnet50v2	64	3,460.26	3,440.36	0.58%	:white_check_mark:
bert-mrpc-onnx	8	1,150.66	1,138.32	1.08%	:white_check_mark:
bert-mrpc-tf	1	457.66	454.52	0.69%	:white_check_mark:
pytorch-examples-wlang-gru	1	485.95	475.40	2.22%	:white_check_mark:
pytorch-examples-wlang-lstm	1	448.87	443.05	1.31%	:white_check_mark:
torchvision-resnet50_1	1	810.86	818.79	-0.97%	:white_check_mark:
cadene-dpn92_1	1	431.87	425.66	1.46%	:white_check_mark:
cadene-resnext101_1	1	393.54	391.85	0.43%	:white_check_mark:
onnx-taau-downsample	1	396.86	394.70	0.55%	:white_check_mark:
dlrm-criteoterabyte	1	32.34	32.18	0.49%	:white_check_mark:
dlrm-criteoterabyte_fp16	1	51.21	51.12	0.17%	:white_check_mark:
agentmodel	1	10,462.72	9,553.93	9.51%	:high_brightness:
unet_fp16	2	58.61	58.49	0.21%	:white_check_mark:
resnet50v1_fp16	1	1,080.35	1,088.59	-0.76%	:white_check_mark:
resnet50v1_int8	1	1,049.93	1,077.39	-2.55%	:white_check_mark:
bert_base_cased_fp16	64	1,170.94	1,163.40	0.65%	:white_check_mark:
bert_large_uncased_fp16	32	356.30	354.51	0.50%	:white_check_mark:
bert_large_fp16	1	196.31	193.67	1.36%	:white_check_mark:
distilgpt2_fp16	16	2,232.85	2,217.48	0.69%	:white_check_mark:
yolov5s	1	540.21	547.68	-1.36%	:white_check_mark:
tinyllama	1	43.89	43.63	0.61%	:white_check_mark:
vicuna-fastchat	1	45.03	43.89	2.60%	:white_check_mark:
whisper-tiny-encoder	1	421.89	420.34	0.37%	:white_check_mark:
whisper-tiny-decoder	1	413.78	412.38	0.34%	:white_check_mark:
llama2_7b	1	nan	nan	nan%	:x:
qwen1.5-7b	1	23.54	23.38	0.65%	:white_check_mark:
phi3-3.8b	1	nan	nan	nan%	:x:
mask-rcnn	1	18.72	18.31	2.26%	:white_check_mark:
llama3-8b	1	21.73	21.64	0.42%	:white_check_mark:
whisper-large-encoder	1	10.21	10.17	0.43%	:white_check_mark:
whisper-large-decoder	1	99.51	97.86	1.69%	:white_check_mark:
mistral-7b	1	23.78	23.65	0.52%	:white_check_mark:
FLUX.1-schnell	1	879.22	895.71	-1.84%	:white_check_mark:
nan	nan	nan	nan	nan%	:x:

This build is not recommended to merge :red_circle:

Apr 19 '25 04:04 migraphx-bot

:white_check_mark: bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

:x:bert-mrpc-tf: ERROR - check error output

2025-04-18 21:54:01.854436: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1745031247.299536 163665 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 62973 MB memory: -> device: 0, name: AMD Instinct MI250X/MI250, pci bus id: 0000:b3:00.0
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1745031248.165054 163665 mlir_graph_optimization_pass.cc:401] MLIR V1 optimization pass is not enabled
2025-04-18 21:54:17.873914: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-04-18 21:54:17.874193: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-04-18 21:54:17.874239: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-04-18 21:54:17.874287: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-04-18 21:54:17.874315: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-04-18 21:54:17.874362: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-04-18 21:54:17.874405: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
2025-04-18 21:54:17.874454: E external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:250] bitcode module is required by this HLO module but was not found at ./opencl.bc
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
error: Failure when generating HSACO
2025-04-18 21:54:17.875536: E tensorflow/compiler/mlir/tools/kernel_gen/tf_framework_c_interface.cc:228] INTERNAL: Generating device code failed.
2025-04-18 21:54:17.876711: W tensorflow/core/framework/op_kernel.cc:1829] UNKNOWN: JIT compilation failed.
2025-04-18 21:54:17.876733: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous is aborting with status: UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
2025-04-18 21:54:17.876745: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous is aborting with status: UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
[[import/loss/output/_21]]
2025-04-18 21:54:17.876761: I tensorflow/core/framework/local_rendezvous.cc:424] Local rendezvous recv item cancelled. Key hash: 11217777527359497193
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1407, in _do_call
return fn(*args)
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1390, in _run_fn
return self._call_tf_sessionrun(options, feed_dict, fetch_list,
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1483, in _call_tf_sessionrun
return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
(0) UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
[[import/loss/output/_21]]
(1) UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 340, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 324, in main
y_out = sess.run(y, feed_dict=tf_dict)
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 977, in run
result = self._run(None, fetches, feed_dict, options_ptr,
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1220, in _run
results = self._do_run(handle, final_targets, final_fetches,
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1400, in _do_run
return self._do_call(_run_fn, feeds, fetches, targets, options,
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/client/session.py", line 1426, in _do_call
raise type(e)(node_def, op, message) # pylint: disable=no-value-for-parameter
tensorflow.python.framework.errors_impl.UnknownError: Graph execution error:

Detected at node 'import/bert/embeddings/LayerNorm/moments/SquaredDifference' defined at (most recent call last):
Node: 'import/bert/embeddings/LayerNorm/moments/SquaredDifference'
Detected at node 'import/bert/embeddings/LayerNorm/moments/SquaredDifference' defined at (most recent call last):
Node: 'import/bert/embeddings/LayerNorm/moments/SquaredDifference'
2 root error(s) found.
(0) UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
[[import/loss/output/_21]]
(1) UNKNOWN: JIT compilation failed.
[[{{node import/bert/embeddings/LayerNorm/moments/SquaredDifference}}]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'import/bert/embeddings/LayerNorm/moments/SquaredDifference':

:white_check_mark: pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

:white_check_mark: pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

:white_check_mark: dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

:white_check_mark: agentmodel: PASSED: MIGraphX meets tolerance

:white_check_mark: unet: PASSED: MIGraphX meets tolerance

:white_check_mark: resnet50v1: PASSED: MIGraphX meets tolerance

:white_check_mark: bert_base_cased_fp16: PASSED: MIGraphX meets tolerance

:red_circle:bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

:white_check_mark: bert_large: PASSED: MIGraphX meets tolerance

:white_check_mark: yolov5s: PASSED: MIGraphX meets tolerance

:white_check_mark: tinyllama: PASSED: MIGraphX meets tolerance

:white_check_mark: vicuna-fastchat: PASSED: MIGraphX meets tolerance

:white_check_mark: whisper-tiny-encoder: PASSED: MIGraphX meets tolerance

:white_check_mark: whisper-tiny-decoder: PASSED: MIGraphX meets tolerance

:white_check_mark: distilgpt2_fp16: PASSED: MIGraphX meets tolerance

:x:llama2_7b: ERROR - check error output

Traceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 340, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 205, in main
model = migraphx.parse_onnx(model_name, default_dim_value=batch)
RuntimeError: /src/AMDMIGraphX/src/onnx/onnx_parser.cpp:265: parse_from: PARSE_FROM: Failed reading onnx file: /new-saved-models/llama2_7b/decoder_model.onnx

:x:phi3-3.8b: ERROR - check error output

Traceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 340, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 205, in main
model = migraphx.parse_onnx(model_name, default_dim_value=batch)
RuntimeError: /src/AMDMIGraphX/src/onnx/onnx_parser.cpp:265: parse_from: PARSE_FROM: Failed reading onnx file: /new-saved-models/phi3-3.8b/model.onnx

:red_circle:mask-rcnn: FAILED: MIGraphX is not within tolerance - check verbose output

:white_check_mark: llama3-8b: PASSED: MIGraphX meets tolerance

:white_check_mark: whisper-large-decoder: PASSED: MIGraphX meets tolerance

:white_check_mark: mistral-7b: PASSED: MIGraphX meets tolerance

:white_check_mark: FLUX.1-schnell: PASSED: MIGraphX meets tolerance

Apr 19 '25 04:04 migraphx-bot