xehpg: jit: gemm: reduce m tile size for int4 * int8 gemv strategies
Addresses MFDNN-13752. Some of the new strategies from #2788 run out of registers -- this PR reduces the m tile size, which avoids this and also seems to improve performance.
make test disable test_device_cpu disable build_cpu_runtime_omp disable build_cpu_runtime_sycl disable build_cpu_runtime_tbb disable arch_gpu_xe-hpc enable arch_gpu_xe-hpg-atsm enable arch_gpu_xe-hpg-dg2 disable arch_gpu_xe-lp enable arch_gpu_xe-lpg enable arch_gpu_xe-lpg+ disable arch_gpu_xe2-hpg-bmg disable arch_gpu_xe2-lpg disable arch_gpu_xe3-lpg disable benchdnn_all enable benchdnn_matmul
make test perf-gpu set primitive=matmul disable arch_gpu_xe-hpc enable arch_gpu_xe-hpg-atsm enable arch_gpu_xe-hpg-dg2 disable arch_gpu_xe-lp enable arch_gpu_xe-lpg enable arch_gpu_xe-lpg+ disable arch_gpu_xe2-hpg-bmg disable arch_gpu_xe2-lpg disable arch_gpu_xe3-lpg
make test disable test_device_cpu disable build_cpu_runtime_omp disable build_cpu_runtime_sycl disable build_cpu_runtime_tbb disable arch_gpu_xe-hpc enable arch_gpu_xe-hpg-atsm enable arch_gpu_xe-hpg-dg2 disable arch_gpu_xe-lp enable arch_gpu_xe-lpg enable arch_gpu_xe-lpg+ disable arch_gpu_xe2-hpg-bmg disable arch_gpu_xe2-lpg disable arch_gpu_xe3-lpg disable benchdnn_all enable benchdnn_matmul
CI failures are unrelated.
The large perf regressions are real but somewhat tangential to this PR as they will disappear when MFDNN-13651 is resolved.
Addressed as part of #3390.