Dmitry Nikolaev

Results 7 issues of Dmitry Nikolaev

Environment variable PYTORCH_MIOPEN_SUGGEST_NHWC=1 enables MIOpen batchnorm for NHWC

`self.assertTrue(torch.equal(out1, out2))` assumes a compete match But we have a slight difference (~1e-7) with fp32 NHWC and NCHW batchnorm output `self.assertEqual(out1, out2)` allows for tolerance

rocm6.4_internal_testing

This PR enables NHWC batchnorm on MIOpen in release/2.6 branch `ROCm version >= 6.5` and `PYTORCH_MIOPEN_SUGGEST_NHWC_BATCHNORM=1` environment variable required to enable nhwc batchnorm Tested on docker image `compute-artifactory.amd.com:5000/rocm-plus-docker/framework/compute-rocm-dkms-no-npi-hipclang:15845_ubuntu22.04_py3.10_pytorch_rocm6.4_internal_testing_8190c80` New batchnorm...

NHWC batchnorm on MIOpen in preview mode supported modes: * NCHW/NHWC fp32 * NCHW/NHWC fp16/bf16 mixed mode (with fp16 input/gradinet and fp32 scale/bias) redundant NHWC-NCHW-NHWC conversions for MiopenBatchNormBackward is fixed...

This PR enables MIOpen for BF16 NCHW Mixed batchnorm if ROCm >= 6.4 cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd

module: rocm
open source
release notes: rocm

Batchnorm tuning enabled for miopen 3.5.1 or higher Set tune policy according to `torch.backends.cudnn.flags(benchmark=True)` before miopen batchnorm call Restore previous tuning mode after batchnorm call