Extending support for binary primitive
Description
This PR extends binary SYCL kernel support for non-uniform group sizes. This includes a new logic for work-item config in kernel launch and handling the trailing portions of workspace. In addition, the PR adds support for common scales and handles saturation and rounding for vectors.
Checklist
General
- [x] Do all unit and benchdnn tests (
make testandmake test_benchdnn_*) pass locally for each commit? test_binary_all.txt - [x] Have you formatted the code using clang-format?
Performance improvements
- [x] Have you submitted performance data that demonstrates performance improvements? oneDNN Performance Tracker.xlsx
THank you for the PR @TejaX-Alaghari . Could you share on which platform(s) you validated this change?
THank you for the PR @TejaX-Alaghari . Could you share on which platform(s) you validated this change?
Validation is performed on Nvidia Tesla T4 GPU. Attached the clinfo and nvidia-smi logs for ref. Let me know if any further info is required. nvidia_clinfo.txt nvidia_smi_info.txt
Closing this outdated PR and raised a new PR https://github.com/oneapi-src/oneDNN/pull/1612.