Add Naive Arithmetic intensity to GEMMs in perf report
This PR modifies the perf report for hipblaslt and rocblas GEMMs to include a naive arithmetic intensity value. The basic motivation is to use this value to get a first-order understanding if the GEMM is leaning towards compute or memory bound.
- TODO: include other pointwise fusions (will require different calculation per operator)
- accounts for beta in case the GEMM is doing AB + C
- also removed deprecated
apply_alpha_betafiles which didn't seem to be used anywhere - similar hipblaslt calculation can be found here https://github.com/ROCm/hipBLASLt/blob/develop/clients/include/flops.hpp#L39
program.cpp is target-independent and shouldn't be checking for target-dependent operators. Also such a calculation is operator-specific and should reside with the operator. The operation class should have a method to return the theoretical number of math operations:
std::size_t get_number_of_math_ops(const std::vector<shape>& inputs) const;
This can return 0 by default. For compiled code objects we can add a field to store this when the code object gets built. So compile_mlir can fill in this field when compiling by just summing up the get_number_of_math_ops from the mlir module.
The memory transactions can be computed generically as these are just the number of elements.
This information does not come from benchmarking so it would be better to just have a driver command that prints this info out using the annotate function.
Codecov Report
Attention: Patch coverage is 29.16667% with 17 lines in your changes missing coverage. Please review.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| src/program.cpp | 29.17% | 17 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## develop #3942 +/- ##
===========================================
- Coverage 92.41% 92.35% -0.06%
===========================================
Files 522 522
Lines 22532 22566 +34
===========================================
+ Hits 20822 20839 +17
- Misses 1710 1727 +17
| Files with missing lines | Coverage Δ | |
|---|---|---|
| src/program.cpp | 68.74% <29.17%> (-1.46%) |
:arrow_down: |
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
| Test | Batch | Rate new 6a7eeb |
Rate old fea22b |
Diff | Compare |
|---|---|---|---|---|---|
| torchvision-resnet50 | 64 | 3,252.21 | 3,224.68 | 0.85% | :white_check_mark: |
| torchvision-resnet50_fp16 | 64 | 6,903.62 | 6,871.66 | 0.47% | :white_check_mark: |
| torchvision-densenet121 | 32 | 2,440.75 | 2,433.28 | 0.31% | :white_check_mark: |
| torchvision-densenet121_fp16 | 32 | 4,219.93 | 4,210.30 | 0.23% | :white_check_mark: |
| torchvision-inceptionv3 | 32 | 1,622.42 | 1,614.36 | 0.50% | :white_check_mark: |
| torchvision-inceptionv3_fp16 | 32 | 2,710.39 | 2,696.30 | 0.52% | :white_check_mark: |
| cadene-inceptionv4 | 16 | 754.05 | 749.80 | 0.57% | :white_check_mark: |
| cadene-resnext64x4 | 16 | 815.21 | 810.05 | 0.64% | :white_check_mark: |
| slim-mobilenet | 64 | 6,693.13 | 6,653.72 | 0.59% | :white_check_mark: |
| slim-nasnetalarge | 64 | 197.44 | 196.50 | 0.47% | :white_check_mark: |
| slim-resnet50v2 | 64 | 3,452.78 | 3,436.85 | 0.46% | :white_check_mark: |
| bert-mrpc-onnx | 8 | 1,150.96 | 1,141.83 | 0.80% | :white_check_mark: |
| bert-mrpc-tf | 1 | 464.27 | 463.91 | 0.08% | :white_check_mark: |
| pytorch-examples-wlang-gru | 1 | 476.70 | 474.32 | 0.50% | :white_check_mark: |
| pytorch-examples-wlang-lstm | 1 | 433.32 | 466.24 | -7.06% | :red_circle: |
| torchvision-resnet50_1 | 1 | 810.61 | 810.37 | 0.03% | :white_check_mark: |
| cadene-dpn92_1 | 1 | 425.43 | 424.31 | 0.26% | :white_check_mark: |
| cadene-resnext101_1 | 1 | 394.34 | 392.70 | 0.42% | :white_check_mark: |
| onnx-taau-downsample | 1 | 397.27 | 396.47 | 0.20% | :white_check_mark: |
| dlrm-criteoterabyte | 1 | 31.94 | 31.82 | 0.37% | :white_check_mark: |
| dlrm-criteoterabyte_fp16 | 1 | 51.05 | 51.01 | 0.09% | :white_check_mark: |
| agentmodel | 1 | 8,589.66 | 8,756.49 | -1.91% | :white_check_mark: |
| unet_fp16 | 2 | nan | 58.40 | nan% | :x: |
| resnet50v1_fp16 | 1 | 1,068.73 | 1,075.67 | -0.64% | :white_check_mark: |
| resnet50v1_int8 | 1 | 882.81 | 891.37 | -0.96% | :white_check_mark: |
| bert_base_cased_fp16 | 64 | 1,170.04 | 1,161.69 | 0.72% | :white_check_mark: |
| bert_large_uncased_fp16 | 32 | 355.98 | 354.15 | 0.52% | :white_check_mark: |
| bert_large_fp16 | 1 | 197.36 | 193.72 | 1.88% | :white_check_mark: |
| distilgpt2_fp16 | 16 | 2,231.98 | 2,214.41 | 0.79% | :white_check_mark: |
| yolov5s | 1 | 513.04 | 514.41 | -0.27% | :white_check_mark: |
| tinyllama | 1 | 43.86 | 43.60 | 0.59% | :white_check_mark: |
| vicuna-fastchat | 1 | 44.22 | 43.94 | 0.63% | :white_check_mark: |
| whisper-tiny-encoder | 1 | 413.36 | 410.08 | 0.80% | :white_check_mark: |
| whisper-tiny-decoder | 1 | 411.76 | 410.36 | 0.34% | :white_check_mark: |
| llama2_7b | 1 | nan | nan | nan% | :x: |
| qwen1.5-7b | 1 | 23.54 | 23.45 | 0.39% | :white_check_mark: |
| phi3-3.8b | 1 | nan | nan | nan% | :x: |
| mask-rcnn | 1 | 18.56 | 18.47 | 0.49% | :white_check_mark: |
| llama3-8b | 1 | 21.28 | 21.18 | 0.48% | :white_check_mark: |
| whisper-large-encoder | 1 | 10.22 | 10.17 | 0.49% | :white_check_mark: |
| whisper-large-decoder | 1 | 98.27 | 98.10 | 0.17% | :white_check_mark: |
| mistral-7b | 1 | 23.77 | 23.65 | 0.52% | :white_check_mark: |
| FLUX.1-schnell | 1 | 899.70 | 908.97 | -1.02% | :white_check_mark: |
| nan | nan | nan | nan | nan% | :x: |
This build is not recommended to merge :red_circle:
:red_circle:bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output
:x:llama2_7b: ERROR - check error output
Traceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 340, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 205, in main
model = migraphx.parse_onnx(model_name, default_dim_value=batch)
RuntimeError: /src/AMDMIGraphX/src/onnx/onnx_parser.cpp:264: parse_from: PARSE_FROM: Failed reading onnx file: /new-saved-models/llama2_7b/decoder_model.onnx:x:#qwen1.5-7b: ERROR - check error output
usage: accuracy_checker.py [-h] [--onnx ONNX] [--tf TF] [--provider PROVIDER]
[--batch BATCH] [--fill1] [--fill0] [--fp16]
[--argmax] [--verbose] [--tolerance TOLERANCE]
[--input-dim INPUT_DIM] [--target TARGET]
[--ort-run] [--ort-logging]
[--disable-offload-copy] [--disable-fast-math]
[--exhaustive_tune]
accuracy_checker.py: error: unrecognized arguments: input_ids attention_mask position_ids 1 256 @attention_mask 1 256 @position_ids 1 256:x:phi3-3.8b: ERROR - check error output
Traceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 340, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 205, in main
model = migraphx.parse_onnx(model_name, default_dim_value=batch)
RuntimeError: /src/AMDMIGraphX/src/onnx/onnx_parser.cpp:264: parse_from: PARSE_FROM: Failed reading onnx file: /new-saved-models/phi3-3.8b/model.onnx:red_circle:mask-rcnn: FAILED: MIGraphX is not within tolerance - check verbose output
:x:#whisper-large-encoder: ERROR - check error output
Traceback (most recent call last):
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 340, in
main()
File "/src/AMDMIGraphX/tools/accuracy/accuracy_checker.py", line 205, in main
model = migraphx.parse_onnx(model_name, default_dim_value=batch)
RuntimeError: /src/AMDMIGraphX/src/include/migraphx/op/convolution.hpp:100: normalize_compute_shape: CONVOLUTION: mismatched channel numbers