[GPU]Batched GEMM scale support
Description
Enable scaling for batched gemms that are not reshaped down to 2d.
Fixes # MFDNN-13705
Checklist
General
- [ ] Do all unit and benchdnn tests (
make testandmake test_benchdnn_*) pass locally for each commit? - [x] Have you formatted the code using clang-format?
@kealan-barbieri -- I didn't see an implementation for {a,b}scPtrDims == 3 (which I guess is the case we want here), is there a missing commit?
@petercad The required cases for MFDNN-13705 so far effectively dont use 3d ptr dims, they can all be handled with conversion to post-ops and existing binary batch offset handling. Will add a follow up commit to handle true 3d scales for cases with int4 weights and nontrivial group.
@kealan-barbieri OK, so if I understand correctly the generator-side changes are not necessary for this commit, but are preparing for your next commit with true 3D scale support.
make test set test_scope=NIGHTLY disable test_device_cpu disable benchdnn_all enable benchdnn_matmul
make test perf-gpu set primitive=matmul