Peter Caday

Results 6 issues of Peter Caday

Omnibus pulldown from upstream gemmstone repository. Highlights include: * Long-overdue refactoring of register layouts into their own class, `RegisterLayout` * Numerous copy planner optimizations/fixes: - New dedicated upconversions to bf16...

platform:gpu-intel

Adds some f16 accumulation FMA strategies (opt-in with --attr-acc-mode=f16) for MTL. Theoretical peak is 2x faster than f32 accumulation and actual performance speedup is similar.

platform:gpu-intel

Backport of #3357 to `rls-v3.9-pc`.

platform:gpu-intel
backport

Addresses MFDNN-13752. Some of the new strategies from #2788 run out of registers -- this PR reduces the m tile size, which avoids this and also seems to improve performance.

platform:gpu-intel

Backport of #3357 to `rls-v3.8`.

platform:gpu-intel
backport

POC of nf4 weights decompression for Intel GPUs (MFDNN-13636), to allow OpenVINO to test it out. Adds a new nf4 data type (may not be final design -- just for...

platform:gpu-intel
component:api
component:tests
component:common