[HLK] Add missing tests for `countbits`
Currently we have defined tests for the 32-bit version of countbits, however, there is a gap for the 16-bit and 64-bit versions.
https://github.com/microsoft/DirectXShaderCompiler/blob/66287b27442d0af17a152d024a6deaadb075cd30/tools/clang/unittests/HLSLExec/ShaderOpArithTable.xml#L2609
This issue tracks the addition of these HLK tests to ensure that behaviour.
Note:
For some additional context, it was observed that the 16-bit version was inconsistent across GPU's when tested locally.
For NV we observed that when the input StructuredBuffer was of type int16_t2 or int16_t4, the output was incorrect.
For WARP we observed that when the input StructuredBuffer was of type int16_t3, the output was incorrect.
Please also see here: https://github.com/llvm/offload-test-suite/pull/205.
AC:
- [ ] Add HLK test for 16-bit countbits
- [ ] Add HLK test for 64-bit countbits