AMDMIGraphX Add Naive Arithmetic intensity to GEMMs in perf report

This PR modifies the perf report for hipblaslt and rocblas GEMMs to include a naive arithmetic intensity value. The basic motivation is to use this value to get a first-order understanding if the GEMM is leaning towards compute or memory bound.

TODO: include other pointwise fusions (will require different calculation per operator)
accounts for beta in case the GEMM is doing AB + C
also removed deprecated apply_alpha_beta files which didn't seem to be used anywhere
similar hipblaslt calculation can be found here https://github.com/ROCm/hipBLASLt/blob/develop/clients/include/flops.hpp#L39

Apr 09 '25 20:04 kahmed10

program.cpp is target-independent and shouldn't be checking for target-dependent operators. Also such a calculation is operator-specific and should reside with the operator. The operation class should have a method to return the theoretical number of math operations:

std::size_t get_number_of_math_ops(const std::vector<shape>& inputs) const;

This can return 0 by default. For compiled code objects we can add a field to store this when the code object gets built. So compile_mlir can fill in this field when compiling by just summing up the get_number_of_math_ops from the mlir module.

The memory transactions can be computed generically as these are just the number of elements.

This information does not come from benchmarking so it would be better to just have a driver command that prints this info out using the annotate function.

Apr 09 '25 22:04 pfultz2

Codecov Report

Attention: Patch coverage is 29.16667% with 17 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/program.cpp	29.17%	17 Missing :warning:

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #3942      +/-   ##
===========================================
- Coverage    92.41%   92.35%   -0.06%     
===========================================
  Files          522      522              
  Lines        22532    22566      +34     
===========================================
+ Hits         20822    20839      +17     
- Misses        1710     1727      +17

Files with missing lines	Coverage Δ
src/program.cpp	`68.74% <29.17%> (-1.46%)`	:arrow_down:

:rocket: New features to boost your workflow:

:snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Apr 10 '25 01:04 codecov[bot]

Test Batch Rate new
6a7eeb Rate old
fea22b Diff Compare

torchvision-resnet50 64 3,252.21 3,224.68 0.85% :white_check_mark:

torchvision-resnet50_fp16 64 6,903.62 6,871.66 0.47% :white_check_mark:

torchvision-densenet121 32 2,440.75 2,433.28 0.31% :white_check_mark:

torchvision-densenet121_fp16 32 4,219.93 4,210.30 0.23% :white_check_mark:

torchvision-inceptionv3 32 1,622.42 1,614.36 0.50% :white_check_mark:

torchvision-inceptionv3_fp16 32 2,710.39 2,696.30 0.52% :white_check_mark:

cadene-inceptionv4 16 754.05 749.80 0.57% :white_check_mark:

cadene-resnext64x4 16 815.21 810.05 0.64% :white_check_mark:

slim-mobilenet 64 6,693.13 6,653.72 0.59% :white_check_mark:

slim-nasnetalarge 64 197.44 196.50 0.47% :white_check_mark:

slim-resnet50v2 64 3,452.78 3,436.85 0.46% :white_check_mark:

bert-mrpc-onnx 8 1,150.96 1,141.83 0.80% :white_check_mark:

bert-mrpc-tf 1 464.27 463.91 0.08% :white_check_mark:

pytorch-examples-wlang-gru 1 476.70 474.32 0.50% :white_check_mark:

pytorch-examples-wlang-lstm 1 433.32 466.24 -7.06% :red_circle:

torchvision-resnet50_1 1 810.61 810.37 0.03% :white_check_mark:

cadene-dpn92_1 1 425.43 424.31 0.26% :white_check_mark:

cadene-resnext101_1 1 394.34 392.70 0.42% :white_check_mark:

onnx-taau-downsample 1 397.27 396.47 0.20% :white_check_mark:

dlrm-criteoterabyte 1 31.94 31.82 0.37% :white_check_mark:

dlrm-criteoterabyte_fp16 1 51.05 51.01 0.09% :white_check_mark:

agentmodel 1 8,589.66 8,756.49 -1.91% :white_check_mark:

unet_fp16 2 nan 58.40 nan% :x:

resnet50v1_fp16 1 1,068.73 1,075.67 -0.64% :white_check_mark:

resnet50v1_int8 1 882.81 891.37 -0.96% :white_check_mark:

bert_base_cased_fp16 64 1,170.04 1,161.69 0.72% :white_check_mark:

bert_large_uncased_fp16 32 355.98 354.15 0.52% :white_check_mark:

bert_large_fp16 1 197.36 193.72 1.88% :white_check_mark:

distilgpt2_fp16 16 2,231.98 2,214.41 0.79% :white_check_mark:

yolov5s 1 513.04 514.41 -0.27% :white_check_mark:

tinyllama 1 43.86 43.60 0.59% :white_check_mark:

vicuna-fastchat 1 44.22 43.94 0.63% :white_check_mark:

whisper-tiny-encoder 1 413.36 410.08 0.80% :white_check_mark:

whisper-tiny-decoder 1 411.76 410.36 0.34% :white_check_mark:

llama2_7b 1 nan nan nan% :x:

qwen1.5-7b 1 23.54 23.45 0.39% :white_check_mark:

phi3-3.8b 1 nan nan nan% :x:

mask-rcnn 1 18.56 18.47 0.49% :white_check_mark:

llama3-8b 1 21.28 21.18 0.48% :white_check_mark:

whisper-large-encoder 1 10.22 10.17 0.49% :white_check_mark:

whisper-large-decoder 1 98.27 98.10 0.17% :white_check_mark:

mistral-7b 1 23.77 23.65 0.52% :white_check_mark:

FLUX.1-schnell 1 899.70 908.97 -1.02% :white_check_mark:

nan nan nan nan nan% :x:

Test	Batch	Rate new 6a7eeb	Rate old fea22b	Diff	Compare
torchvision-resnet50	64	3,252.21	3,224.68	0.85%	:white_check_mark:
torchvision-resnet50_fp16	64	6,903.62	6,871.66	0.47%	:white_check_mark:
torchvision-densenet121	32	2,440.75	2,433.28	0.31%	:white_check_mark:
torchvision-densenet121_fp16	32	4,219.93	4,210.30	0.23%	:white_check_mark:
torchvision-inceptionv3	32	1,622.42	1,614.36	0.50%	:white_check_mark:
torchvision-inceptionv3_fp16	32	2,710.39	2,696.30	0.52%	:white_check_mark:
cadene-inceptionv4	16	754.05	749.80	0.57%	:white_check_mark:
cadene-resnext64x4	16	815.21	810.05	0.64%	:white_check_mark:
slim-mobilenet	64	6,693.13	6,653.72	0.59%	:white_check_mark:
slim-nasnetalarge	64	197.44	196.50	0.47%	:white_check_mark:
slim-resnet50v2	64	3,452.78	3,436.85	0.46%	:white_check_mark:
bert-mrpc-onnx	8	1,150.96	1,141.83	0.80%	:white_check_mark:
bert-mrpc-tf	1	464.27	463.91	0.08%	:white_check_mark:
pytorch-examples-wlang-gru	1	476.70	474.32	0.50%	:white_check_mark:
pytorch-examples-wlang-lstm	1	433.32	466.24	-7.06%	:red_circle:
torchvision-resnet50_1	1	810.61	810.37	0.03%	:white_check_mark:
cadene-dpn92_1	1	425.43	424.31	0.26%	:white_check_mark:
cadene-resnext101_1	1	394.34	392.70	0.42%	:white_check_mark:
onnx-taau-downsample	1	397.27	396.47	0.20%	:white_check_mark:
dlrm-criteoterabyte	1	31.94	31.82	0.37%	:white_check_mark:
dlrm-criteoterabyte_fp16	1	51.05	51.01	0.09%	:white_check_mark:
agentmodel	1	8,589.66	8,756.49	-1.91%	:white_check_mark:
unet_fp16	2	nan	58.40	nan%	:x:
resnet50v1_fp16	1	1,068.73	1,075.67	-0.64%	:white_check_mark:
resnet50v1_int8	1	882.81	891.37	-0.96%	:white_check_mark:
bert_base_cased_fp16	64	1,170.04	1,161.69	0.72%	:white_check_mark:
bert_large_uncased_fp16	32	355.98	354.15	0.52%	:white_check_mark:
bert_large_fp16	1	197.36	193.72	1.88%	:white_check_mark:
distilgpt2_fp16	16	2,231.98	2,214.41	0.79%	:white_check_mark:
yolov5s	1	513.04	514.41	-0.27%	:white_check_mark:
tinyllama	1	43.86	43.60	0.59%	:white_check_mark:
vicuna-fastchat	1	44.22	43.94	0.63%	:white_check_mark:
whisper-tiny-encoder	1	413.36	410.08	0.80%	:white_check_mark:
whisper-tiny-decoder	1	411.76	410.36	0.34%	:white_check_mark:
llama2_7b	1	nan	nan	nan%	:x:
qwen1.5-7b	1	23.54	23.45	0.39%	:white_check_mark:
phi3-3.8b	1	nan	nan	nan%	:x:
mask-rcnn	1	18.56	18.47	0.49%	:white_check_mark:
llama3-8b	1	21.28	21.18	0.48%	:white_check_mark:
whisper-large-encoder	1	10.22	10.17	0.49%	:white_check_mark:
whisper-large-decoder	1	98.27	98.10	0.17%	:white_check_mark:
mistral-7b	1	23.77	23.65	0.52%	:white_check_mark:
FLUX.1-schnell	1	899.70	908.97	-1.02%	:white_check_mark:
nan	nan	nan	nan	nan%	:x:

This build is not recommended to merge :red_circle:

Apr 10 '25 05:04 migraphx-bot

:white_check_mark: bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

:white_check_mark: bert-mrpc-tf: PASSED: MIGraphX meets tolerance

:white_check_mark: pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

:white_check_mark: pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

:white_check_mark: torchvision-resnet50_1: PASSED: MIGraphX meets tolerance

:white_check_mark: cadene-dpn92_1: PASSED: MIGraphX meets tolerance

:white_check_mark: cadene-resnext101_1: PASSED: MIGraphX meets tolerance

:white_check_mark: dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

:white_check_mark: agentmodel: PASSED: MIGraphX meets tolerance

:white_check_mark: unet: PASSED: MIGraphX meets tolerance

:white_check_mark: resnet50v1: PASSED: MIGraphX meets tolerance

:white_check_mark: bert_base_cased_fp16: PASSED: MIGraphX meets tolerance

:red_circle:bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

:white_check_mark: bert_large: PASSED: MIGraphX meets tolerance

:white_check_mark: yolov5s: PASSED: MIGraphX meets tolerance

:white_check_mark: tinyllama: PASSED: MIGraphX meets tolerance

:white_check_mark: vicuna-fastchat: PASSED: MIGraphX meets tolerance

:white_check_mark: whisper-tiny-encoder: PASSED: MIGraphX meets tolerance

:white_check_mark: whisper-tiny-decoder: PASSED: MIGraphX meets tolerance

:white_check_mark: distilgpt2_fp16: PASSED: MIGraphX meets tolerance

:x:llama2_7b: ERROR - check error output

Traceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 340, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 205, in main
model = migraphx.parse_onnx(model_name, default_dim_value=batch)
RuntimeError: /src/AMDMIGraphX/src/onnx/onnx_parser.cpp:264: parse_from: PARSE_FROM: Failed reading onnx file: /new-saved-models/llama2_7b/decoder_model.onnx

:x:#qwen1.5-7b: ERROR - check error output

usage: accuracy_checker.py [-h] [--onnx ONNX] [--tf TF] [--provider PROVIDER]
[--batch BATCH] [--fill1] [--fill0] [--fp16]
[--argmax] [--verbose] [--tolerance TOLERANCE]
[--input-dim INPUT_DIM] [--target TARGET]
[--ort-run] [--ort-logging]
[--disable-offload-copy] [--disable-fast-math]
[--exhaustive_tune]
accuracy_checker.py: error: unrecognized arguments: input_ids attention_mask position_ids 1 256 @attention_mask 1 256 @position_ids 1 256

:x:phi3-3.8b: ERROR - check error output

Traceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 340, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 205, in main
model = migraphx.parse_onnx(model_name, default_dim_value=batch)
RuntimeError: /src/AMDMIGraphX/src/onnx/onnx_parser.cpp:264: parse_from: PARSE_FROM: Failed reading onnx file: /new-saved-models/phi3-3.8b/model.onnx

:red_circle:mask-rcnn: FAILED: MIGraphX is not within tolerance - check verbose output

:white_check_mark: llama3-8b: PASSED: MIGraphX meets tolerance

:x:#whisper-large-encoder: ERROR - check error output

Traceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 340, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 205, in main
model = migraphx.parse_onnx(model_name, default_dim_value=batch)
RuntimeError: /src/AMDMIGraphX/src/include/migraphx/op/convolution.hpp:100: normalize_compute_shape: CONVOLUTION: mismatched channel numbers

:white_check_mark: whisper-large-decoder: PASSED: MIGraphX meets tolerance

:white_check_mark: mistral-7b: PASSED: MIGraphX meets tolerance

:white_check_mark: FLUX.1-schnell: PASSED: MIGraphX meets tolerance

Apr 10 '25 05:04 migraphx-bot