xelpg: jit: gemm: additional f16 accumulation strategies
Adds some f16 accumulation FMA strategies (opt-in with --attr-acc-mode=f16) for MTL. Theoretical peak is 2x faster than f32 accumulation and actual performance speedup is similar.
make test linters
make test disable test_device_cpu disable build_cpu_runtime_omp disable build_cpu_runtime_sycl disable build_cpu_runtime_tbb disable benchdnn_all enable benchdnn_matmul
make test linters
make test disable test_device_cpu disable build_cpu_runtime_omp disable build_cpu_runtime_sycl disable build_cpu_runtime_tbb disable benchdnn_all enable benchdnn_matmul