DirectML Exception 80070057 "The parameter is incorrect"
Describe the issue
I exported the following PyTorch model: https://pytorch.org/hub/pytorch_vision_googlenet using TorchDynamo (see result ONNX model attached in next section) and can run inference using ONNX Runtime 1.17.3 with the CPU and CUDA provider but it fails with the DirectML provider. Full exception is the following:
C:\build\ort-1.17.3\onnxruntime\onnxruntime\core\providers\dml\DmlExecutionProvider\src\DmlGraphFusionHelper.cpp(451)\onnxruntime.dll!00007FFE11451461: (caller: 00007FFE114311D1) Exception(1) tid(ae54) 80070057 The parameter is incorrect.
I enabled DirectML debug layers but it did not provide more insights:
C:\__w\1\s\SharedValidation\GraphDescValidator.h(34)\DirectML.dll!00007FFE0FB010A5: (caller: 00007FFE0FAD9CC5) Exception(1) tid(ae54) 80070057 The parameter is incorrect.
Exception thrown at 0x00007FFF5CE4AB89 in onnxruntime_perf_test.exe: Microsoft C++ exception: wil::ResultException at memory location 0x0000004F310FB920.
Exception thrown at 0x00007FFF5CE4AB89 in onnxruntime_perf_test.exe: Microsoft C++ exception: [rethrow] at memory location 0x0000000000000000.
C:\__w\1\s\Product\DmlDevice.cpp(782)\DirectML.dll!00007FFE0FD5069E: (caller: 00007FFE11450F65) ReturnHr(1) tid(ae54) 80070057 The parameter is incorrect.
Msg:[C:\__w\1\s\SharedValidation\GraphDescValidator.h(34)\DirectML.dll!00007FFE0FB010A5: (caller: 00007FFE0FAD9CC5) Exception(1) tid(ae54) 80070057 The parameter is incorrect.
To reproduce
To reproduce:
- Unzip the attached ONNX Model: gnet_dynamo.zip
- Run the onnxruntime_perf_test binary (built from source from branch 1.17.3) with the following arguments:
onnxruntime_perf_test.exe -e dml -m times -r 5 -p profile_gnet_dynamo_dml.json -I gnet_dynamo.onnx
Urgency
No response
Platform
Windows
OS Version
10
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.17.3
ONNX Runtime API
C++
Architecture
X64
Execution Provider
Default CPU, CUDA, DirectML
Execution Provider Library Version
No response
I have the same problem on some old GPU devices(NVIDIA GeForce GTX 780).
But this problem is not encountered on some newer graphics cards than the NVIDIA GeForce GTX 980.
Error Code:
RUNTIME_EXCEPTION
Error Message:
Non-zero status code returned while running Mul node. Name:'/G/encoder_level1/encoder_level1.0/norm1/body/Mul' Status Message:
D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2449)\onnxruntime.dll!00007FFA2C65ADD5: (caller: 00007FFA2C65A468) Exception(6) tid(4764) 80070057 The parameter is incorrect.
Problem GPU Device: NVIDIA GeForce GTX 780 Driver Version: 471.11 (2021.6.23)
Platform: Windows OS Version: 10 ONNX Runtime Installation: Download from github release page ONNX Runtime Version or Commit ID: 1.16.23.1119 ONNX Runtime API: C++ Architecture: X64 Execution Provider: Default CPU, DirectML Execution Provider Library Version: DirectML.dll version: 1.13.1.0
https://github.com/microsoft/onnxruntime/issues/20742
See https://github.com/microsoft/onnxruntime/issues/20742#issuecomment-2241622555