RuntimeError: could not create an engine

Open aruhela opened this issue 1 year ago • 1 comments

Hi Intel Team

I am observing "could not create an engine" error in executing demo.py example from "oneCCL Bindings for PyTorch Getting Started Sample*". The code is run on Saphire node with 4 PVCs at TACC system. Any suggestions on identifying the cause and fixing it?

(base) c551-003pvc$ mpirun -n 2 -l python demo.py -dev xpu [0] Runing Iteration: 0 on device xpu:0 [0] Runing forward: 0 on device xpu:0 [0] Traceback (most recent call last): [0] File "/scratch/05231/aruhela/demo.py", line 67, in [1] Runing Iteration: 0 on device xpu:1 [1] Runing forward: 0 on device xpu:1 [1] Traceback (most recent call last): [1] File "/scratch/05231/aruhela/demo.py", line 67, in [0] res = model(input) [0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl [1] res = model(input) [1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl [0] return self._call_impl(*args, **kwargs) [0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl [1] return self._call_impl(*args, **kwargs) [1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl [0] return forward_call(*args, **kwargs) [0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 1519, in forward [1] return forward_call(*args, **kwargs) [1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 1519, in forward [0] else self._run_ddp_forward(*inputs, **kwargs) [0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 1355, in _run_ddp_forward [1] else self._run_ddp_forward(*inputs, **kwargs) [1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 1355, in _run_ddp_forward [0] return self.module(*inputs, **kwargs) # type: ignore[index] [0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl [1] return self.module(*inputs, **kwargs) # type: ignore[index] [1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl [0] return self._call_impl(*args, **kwargs) [0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl [1] return self._call_impl(*args, **kwargs) [1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl [0] return forward_call(*args, **kwargs) [0] File "/scratch/05231/aruhela/demo.py", line 26, in forward [1] return forward_call(*args, **kwargs) [1] File "/scratch/05231/aruhela/demo.py", line 26, in forward [0] return self.linear(input) [0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl [1] return self.linear(input) [1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl [0] return self._call_impl(*args, **kwargs) [0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl [1] return self._call_impl(*args, **kwargs) [1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl [0] return forward_call(*args, **kwargs) [0] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 114, in forward [1] return forward_call(*args, **kwargs) [1] File "/scratch/projects/compilers/intel24.2/oneapi/intelpython/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 114, in forward [0] return F.linear(input, self.weight, self.bias) [0] RuntimeError: could not create an engine [1] return F.linear(input, self.weight, self.bias) [1] RuntimeError: could not create an engine (base) c551-003pvc$

Notes: OneAPI release is 2024.2 Install command (AI Selector Tool): conda install -c intel -c conda-forge --override-channels intel/label/oneapi::intel-extension-for-pytorch=2.1.20 intel/label/oneapi::pytorch=2.1.0 intel/label/oneapi::oneccl_bind_pt=2.1.200 intel/label/oneapi::torchvision=0.16.0 intel/label/oneapi::torchaudio=2.1.0 conda-forge::deepspeed=0.14.0 python=3.9

Thanks Amit Ruhela

Jul 06 '24 20:07 aruhela

Hi, I also experienced this error. The message before the exception:

File c:\Users\xiaoy\anaconda3\envs\llm2\Lib\site-packages\torch\nn\modules\linear.py:125, in Linear.forward(self, input) 124 def forward(self, input: Tensor) -> Tensor: --> 125 return F.linear(input, self.weight, self.bias)

RuntimeError: could not create an engine

GPU: Intel ARC B580 (with the latest driver) OS: Windows 11 Conda/Python: 3.12 PyTorch instance: pip3 install --pre torch --index-url https://download.pytorch.org/whl/nightly/xpu

Dec 27 '24 17:12 xyang2013