Dmitry Nikolaev
Dmitry Nikolaev
Environment variable PYTORCH_MIOPEN_SUGGEST_NHWC=1 enables MIOpen batchnorm for NHWC
`self.assertTrue(torch.equal(out1, out2))` assumes a compete match But we have a slight difference (~1e-7) with fp32 NHWC and NCHW batchnorm output `self.assertEqual(out1, out2)` allows for tolerance
rocm6.4_internal_testing
This PR enables NHWC batchnorm on MIOpen in release/2.6 branch `ROCm version >= 6.5` and `PYTORCH_MIOPEN_SUGGEST_NHWC_BATCHNORM=1` environment variable required to enable nhwc batchnorm Tested on docker image `compute-artifactory.amd.com:5000/rocm-plus-docker/framework/compute-rocm-dkms-no-npi-hipclang:15845_ubuntu22.04_py3.10_pytorch_rocm6.4_internal_testing_8190c80` New batchnorm...
NHWC batchnorm on MIOpen in preview mode supported modes: * NCHW/NHWC fp32 * NCHW/NHWC fp16/bf16 mixed mode (with fp16 input/gradinet and fp32 scale/bias) redundant NHWC-NCHW-NHWC conversions for MiopenBatchNormBackward is fixed...
This PR enables MIOpen for BF16 NCHW Mixed batchnorm if ROCm >= 6.4 cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd
Batchnorm tuning enabled for miopen 3.5.1 or higher Set tune policy according to `torch.backends.cudnn.flags(benchmark=True)` before miopen batchnorm call Restore previous tuning mode after batchnorm call