AMDMIGraphX icon indicating copy to clipboard operation
AMDMIGraphX copied to clipboard

Add Naive Arithmetic intensity to GEMMs in perf report

Open kahmed10 opened this issue 11 months ago • 4 comments

This PR modifies the perf report for hipblaslt and rocblas GEMMs to include a naive arithmetic intensity value. The basic motivation is to use this value to get a first-order understanding if the GEMM is leaning towards compute or memory bound.

  • TODO: include other pointwise fusions (will require different calculation per operator)
  • accounts for beta in case the GEMM is doing AB + C
  • also removed deprecated apply_alpha_beta files which didn't seem to be used anywhere
  • similar hipblaslt calculation can be found here https://github.com/ROCm/hipBLASLt/blob/develop/clients/include/flops.hpp#L39

kahmed10 avatar Apr 09 '25 20:04 kahmed10

program.cpp is target-independent and shouldn't be checking for target-dependent operators. Also such a calculation is operator-specific and should reside with the operator. The operation class should have a method to return the theoretical number of math operations:

std::size_t get_number_of_math_ops(const std::vector<shape>& inputs) const;

This can return 0 by default. For compiled code objects we can add a field to store this when the code object gets built. So compile_mlir can fill in this field when compiling by just summing up the get_number_of_math_ops from the mlir module.

The memory transactions can be computed generically as these are just the number of elements.

This information does not come from benchmarking so it would be better to just have a driver command that prints this info out using the annotate function.

pfultz2 avatar Apr 09 '25 22:04 pfultz2

Codecov Report

Attention: Patch coverage is 29.16667% with 17 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/program.cpp 29.17% 17 Missing :warning:
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #3942      +/-   ##
===========================================
- Coverage    92.41%   92.35%   -0.06%     
===========================================
  Files          522      522              
  Lines        22532    22566      +34     
===========================================
+ Hits         20822    20839      +17     
- Misses        1710     1727      +17     
Files with missing lines Coverage Δ
src/program.cpp 68.74% <29.17%> (-1.46%) :arrow_down:
:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Apr 10 '25 01:04 codecov[bot]

Test Batch Rate new
6a7eeb
Rate old
fea22b
Diff Compare
torchvision-resnet50 64 3,252.21 3,224.68 0.85% :white_check_mark:
torchvision-resnet50_fp16 64 6,903.62 6,871.66 0.47% :white_check_mark:
torchvision-densenet121 32 2,440.75 2,433.28 0.31% :white_check_mark:
torchvision-densenet121_fp16 32 4,219.93 4,210.30 0.23% :white_check_mark:
torchvision-inceptionv3 32 1,622.42 1,614.36 0.50% :white_check_mark:
torchvision-inceptionv3_fp16 32 2,710.39 2,696.30 0.52% :white_check_mark:
cadene-inceptionv4 16 754.05 749.80 0.57% :white_check_mark:
cadene-resnext64x4 16 815.21 810.05 0.64% :white_check_mark:
slim-mobilenet 64 6,693.13 6,653.72 0.59% :white_check_mark:
slim-nasnetalarge 64 197.44 196.50 0.47% :white_check_mark:
slim-resnet50v2 64 3,452.78 3,436.85 0.46% :white_check_mark:
bert-mrpc-onnx 8 1,150.96 1,141.83 0.80% :white_check_mark:
bert-mrpc-tf 1 464.27 463.91 0.08% :white_check_mark:
pytorch-examples-wlang-gru 1 476.70 474.32 0.50% :white_check_mark:
pytorch-examples-wlang-lstm 1 433.32 466.24 -7.06% :red_circle:
torchvision-resnet50_1 1 810.61 810.37 0.03% :white_check_mark:
cadene-dpn92_1 1 425.43 424.31 0.26% :white_check_mark:
cadene-resnext101_1 1 394.34 392.70 0.42% :white_check_mark:
onnx-taau-downsample 1 397.27 396.47 0.20% :white_check_mark:
dlrm-criteoterabyte 1 31.94 31.82 0.37% :white_check_mark:
dlrm-criteoterabyte_fp16 1 51.05 51.01 0.09% :white_check_mark:
agentmodel 1 8,589.66 8,756.49 -1.91% :white_check_mark:
unet_fp16 2 nan 58.40 nan% :x:
resnet50v1_fp16 1 1,068.73 1,075.67 -0.64% :white_check_mark:
resnet50v1_int8 1 882.81 891.37 -0.96% :white_check_mark:
bert_base_cased_fp16 64 1,170.04 1,161.69 0.72% :white_check_mark:
bert_large_uncased_fp16 32 355.98 354.15 0.52% :white_check_mark:
bert_large_fp16 1 197.36 193.72 1.88% :white_check_mark:
distilgpt2_fp16 16 2,231.98 2,214.41 0.79% :white_check_mark:
yolov5s 1 513.04 514.41 -0.27% :white_check_mark:
tinyllama 1 43.86 43.60 0.59% :white_check_mark:
vicuna-fastchat 1 44.22 43.94 0.63% :white_check_mark:
whisper-tiny-encoder 1 413.36 410.08 0.80% :white_check_mark:
whisper-tiny-decoder 1 411.76 410.36 0.34% :white_check_mark:
llama2_7b 1 nan nan nan% :x:
qwen1.5-7b 1 23.54 23.45 0.39% :white_check_mark:
phi3-3.8b 1 nan nan nan% :x:
mask-rcnn 1 18.56 18.47 0.49% :white_check_mark:
llama3-8b 1 21.28 21.18 0.48% :white_check_mark:
whisper-large-encoder 1 10.22 10.17 0.49% :white_check_mark:
whisper-large-decoder 1 98.27 98.10 0.17% :white_check_mark:
mistral-7b 1 23.77 23.65 0.52% :white_check_mark:
FLUX.1-schnell 1 899.70 908.97 -1.02% :white_check_mark:
nan nan nan nan nan% :x:

This build is not recommended to merge :red_circle:

migraphx-bot avatar Apr 10 '25 05:04 migraphx-bot


     :white_check_mark: bert-mrpc-onnx: PASSED: MIGraphX meets tolerance
     :white_check_mark: bert-mrpc-tf: PASSED: MIGraphX meets tolerance
     :white_check_mark: pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance
     :white_check_mark: pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance
     :white_check_mark: torchvision-resnet50_1: PASSED: MIGraphX meets tolerance
     :white_check_mark: cadene-dpn92_1: PASSED: MIGraphX meets tolerance
     :white_check_mark: cadene-resnext101_1: PASSED: MIGraphX meets tolerance
     :white_check_mark: dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance
     :white_check_mark: agentmodel: PASSED: MIGraphX meets tolerance
     :white_check_mark: unet: PASSED: MIGraphX meets tolerance
     :white_check_mark: resnet50v1: PASSED: MIGraphX meets tolerance
     :white_check_mark: bert_base_cased_fp16: PASSED: MIGraphX meets tolerance
:red_circle:bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

     :white_check_mark: bert_large: PASSED: MIGraphX meets tolerance
     :white_check_mark: yolov5s: PASSED: MIGraphX meets tolerance
     :white_check_mark: tinyllama: PASSED: MIGraphX meets tolerance
     :white_check_mark: vicuna-fastchat: PASSED: MIGraphX meets tolerance
     :white_check_mark: whisper-tiny-encoder: PASSED: MIGraphX meets tolerance
     :white_check_mark: whisper-tiny-decoder: PASSED: MIGraphX meets tolerance
     :white_check_mark: distilgpt2_fp16: PASSED: MIGraphX meets tolerance
:x:llama2_7b: ERROR - check error outputTraceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 340, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 205, in main
model = migraphx.parse_onnx(model_name, default_dim_value=batch)
RuntimeError: /src/AMDMIGraphX/src/onnx/onnx_parser.cpp:264: parse_from: PARSE_FROM: Failed reading onnx file: /new-saved-models/llama2_7b/decoder_model.onnx

:x:#qwen1.5-7b: ERROR - check error outputusage: accuracy_checker.py [-h] [--onnx ONNX] [--tf TF] [--provider PROVIDER]
[--batch BATCH] [--fill1] [--fill0] [--fp16]
[--argmax] [--verbose] [--tolerance TOLERANCE]
[--input-dim INPUT_DIM] [--target TARGET]
[--ort-run] [--ort-logging]
[--disable-offload-copy] [--disable-fast-math]
[--exhaustive_tune]
accuracy_checker.py: error: unrecognized arguments: input_ids attention_mask position_ids 1 256 @attention_mask 1 256 @position_ids 1 256

:x:phi3-3.8b: ERROR - check error outputTraceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 340, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 205, in main
model = migraphx.parse_onnx(model_name, default_dim_value=batch)
RuntimeError: /src/AMDMIGraphX/src/onnx/onnx_parser.cpp:264: parse_from: PARSE_FROM: Failed reading onnx file: /new-saved-models/phi3-3.8b/model.onnx

:red_circle:mask-rcnn: FAILED: MIGraphX is not within tolerance - check verbose output

     :white_check_mark: llama3-8b: PASSED: MIGraphX meets tolerance
:x:#whisper-large-encoder: ERROR - check error outputTraceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 340, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 205, in main
model = migraphx.parse_onnx(model_name, default_dim_value=batch)
RuntimeError: /src/AMDMIGraphX/src/include/migraphx/op/convolution.hpp:100: normalize_compute_shape: CONVOLUTION: mismatched channel numbers

     :white_check_mark: whisper-large-decoder: PASSED: MIGraphX meets tolerance
     :white_check_mark: mistral-7b: PASSED: MIGraphX meets tolerance
     :white_check_mark: FLUX.1-schnell: PASSED: MIGraphX meets tolerance

migraphx-bot avatar Apr 10 '25 05:04 migraphx-bot