PufferLib of Intel

PyTorch Prerequisites

[!NOTE] Developers who want to run PyTorch deep learning workloads need to install only the drivers and pip install PyTorch wheels binaries. The runtime package for the Intel® Deep Learning Essentials is installed automatically during the pip installation of the PyTorch wheels binaries. — Intel

Dr. Suarez found CTranslate2 on stream through cibuildwheel. My guess being OpenBLAS is deprecated; I've no experience with oneDNN or other oneapi resource(s) other than Level Zero, but haven't used it yet.

Found this; of interest may be this file.

[!IMPORTANT] Developers building PyTorch from source code need to install both the driver and Intel Deep Learning Essentials. — Intel

Instead of the installer shown above, I'm using the standalone installer available for the compiler. If Intel's Deep Neural Network Library and Math Kernel Library are useful, please comment below.

$ ocloc query CL_DEVICE_EXTENSIONS
cl_ext_float_atomics cl_intel_accelerator cl_intel_command_queue_families cl_intel_device_attribute_query cl_intel_driver_diagnostics cl_intel_mem_force_host_memory cl_intel_required_subgroup_size cl_intel_spirv_subgroups cl_intel_split_work_group_barrier cl_intel_subgroup_local_block_io cl_intel_subgroups cl_intel_subgroups_char cl_intel_subgroups_long cl_intel_subgroups_short cl_intel_unified_shared_memory cl_khr_byte_addressable_store cl_khr_create_command_queue cl_khr_device_uuid cl_khr_extended_bit_ops cl_khr_external_memory cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_il_program cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_integer_dot_product cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_priority_hints cl_khr_spir cl_khr_spirv_linkonce_odr cl_khr_spirv_no_integer_wrap_decoration cl_khr_subgroup_ballot cl_khr_subgroup_clustered_reduce cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_suggested_local_work_size cl_khr_throttle_hints

$ ocloc query OCL_DRIVER_VERSION
1.0.032413

$ ocloc query CL_DEVICE_OPENCL_C_ALL_VERSIONS
"OpenCL C":1.0.0 "OpenCL C":1.1.0 "OpenCL C":1.2.0 "OpenCL C":3.0.0

$ ocloc query CL_DEVICE_OPENCL_C_FEATURES
__opencl_c_atomic_order_acq_rel:3.0.0 __opencl_c_atomic_order_seq_cst:3.0.0 __opencl_c_atomic_scope_all_devices:3.0.0 __opencl_c_atomic_scope_device:3.0.0 __opencl_c_ext_fp16_global_atomic_load_store:3.0.0 __opencl_c_ext_fp16_global_atomic_min_max:3.0.0 __opencl_c_ext_fp16_local_atomic_load_store:3.0.0 __opencl_c_ext_fp16_local_atomic_min_max:3.0.0 __opencl_c_ext_fp32_global_atomic_add:3.0.0 __opencl_c_ext_fp32_global_atomic_min_max:3.0.0 __opencl_c_ext_fp32_local_atomic_add:3.0.0 __opencl_c_ext_fp32_local_atomic_min_max:3.0.0 __opencl_c_generic_address_space:3.0.0 __opencl_c_int64:3.0.0 __opencl_c_integer_dot_product_input_4x8bit:3.0.0 __opencl_c_integer_dot_product_input_4x8bit_packed:3.0.0 __opencl_c_program_scope_global_variables:3.0.0 __opencl_c_subgroups:3.0.0 __opencl_c_work_group_collective_functions:3.0.0

$ ocloc query CL_DEVICE_PROFILE
FULL_PROFILE

:: initializing oneAPI environment...
   Initializing Visual Studio command-line environment...
   Visual Studio version 17.13.6 environment configured.
   "C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\"
   Visual Studio command-line environment initialized for: 'x64'
:  compiler -- latest
:  debugger -- latest
:  dev-utilities -- latest
:  dpl -- latest
:  ocloc -- latest
:  tbb -- latest
:  umf -- latest
:: oneAPI environment initialized ::

C:\Program Files (x86)\Intel\oneAPI>ocloc query SUPPORTED_DEVICES

C:\Program Files (x86)\Intel\oneAPI>

$ icx
Intel(R) oneAPI DPC++/C++ Compiler for applications running on Intel(R) 64, Version 2025.1.0 Build 20250317
Copyright (C) 1985-2025 Intel Corporation. All rights reserved.

icx: error: no input files

$ icpx
icpx: error: no input files

With C:\Program Files (x86)\Intel\oneAPI\compiler\2025.1\bin\common_clang64.dll, does this mean icx/icpx is clang-compatible? Is it usable in other projects? That aside; would really like to use it if it includes feature(s) facilitating hardware acceleration.

if needed: https://www.intel.com/content/www/us/en/developer/articles/technical/vectorization-llvm-gcc-cpus-gpus.html

... As a continuous effort, more performance tuning and optimizations will be added into Intel oneAPI LLVM-based compilers and GCC compilers for Intel CPUs AVX-512 and AVX-512-FP16/VNNI ISA and Intel GPUs Gen12 ISA. — Intel

Without Visual Studio Build Tools 2022 available in Linux, compilation fails if needing vcruntime.h when using icx or icpx. Noticed -std= as expected with icx in linux seems to be -Qstd= with icx in Windows.

https://intel.github.io/intel-extension-for-pytorch/

Note: The current implementation of the DPC++ extension only supports Linux. — Intel

As for pufferlib - bbd22d - if starting with device = xpu in pufferlib/config/ocean/target.ini, linux shows AssertionError: Torch not compiled with XPU enabled which confirms the possibility. Officially without windows support as of 2.0; after pip install -e . --break-system-packages, getting LINK : error LNK2001: unresolved external symbol PyInit_ocean\target\binding with this merge commit.

Found Intel's install through their tutorial and example, seemingly without any Known Issue after both pip install commands completed successfully:

C:\Program Files (x86)\Intel\oneAPI>python -c "import torch; import intel_extension_for_pytorch as ipex; print(torch.__version__); print(ipex.__version__); [print(f'[{i}]: {torch.xpu.get_device_properties(i)}') for i in range(torch.xpu.device_count())];"
[W530 13:29:29.000000000 OperatorEntry.cpp:161] Warning: Warning only once for all operators,  other operators may also be overridden.
  Overriding a previously registered kernel for the same operator and the same dispatch key
  operator: aten::geometric_(Tensor(a!) self, float p, *, Generator? generator=None) -> Tensor(a!)
    registered at C:\actions-runner\_work\pytorch\pytorch\pytorch\build\aten\src\ATen\RegisterSchema.cpp:6
  dispatch key: XPU
  previous kernel: registered at C:\actions-runner\_work\pytorch\pytorch\pytorch\aten\src\ATen\VmapModeRegistrations.cpp:37
       new kernel: registered at I:\frameworks.ai.pytorch.ipex-gpu\build\Release\csrc\gpu\csrc\gpu\xpu\ATen\RegisterXPU_0.cpp:186 (function operator ())
2.7.0+xpu
2.7.10+xpu
C:\Users\jayg8\AppData\Local\Programs\Python\Python313\Lib\site-packages\torch\xpu\__init__.py:60: UserWarning: XPU device count is zero! (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\pytorch\c10\xpu\XPUFunctions.cpp:115.)
  return torch._C._xpu_getDeviceCount()

C:\Program Files (x86)\Intel\oneAPI>

(not including torchvision and torchaudio in either pip install and guessing Microsoft runtime isn't needed as already using Visual Studio Build Tools 2022)

As for Linux, Intel has pip, source and docker selections if needed.

May need Level Zero:

garner@linux:~$ python -c "import torch; import intel_extension_for_pytorch as ipex; print(torch.__version__); print(ipex.__version__); [print(f'[{i}]: {torch.xpu.get_device_properties(i)}') for i in range(torch.xpu.device_count())];"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python3.12/dist-packages/intel_extension_for_pytorch/__init__.py", line 122, in <module>
    from .utils._proxy_module import *
  File "/usr/local/lib/python3.12/dist-packages/intel_extension_for_pytorch/utils/_proxy_module.py", line 2, in <module>
    import intel_extension_for_pytorch._C
ImportError: libze_loader.so.1: cannot open shared object file: No such file or directory
garner@linux:~$

After installing the generated .deb - level-zero_1.9.9+l22.1_amd64.deb - by checking out the level-zero tag v1.9.9:

garner@linux:~$ python -c "import torch; import intel_extension_for_pytorch as ipex; print(torch.__version__); print(ipex.__version__); [print(f'[{i}]: {torch.xpu.get_device_properties(i)}') for i in range(torch.xpu.device_count())];"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python3.12/dist-packages/intel_extension_for_pytorch/__init__.py", line 122, in <module>
    from .utils._proxy_module import *
  File "/usr/local/lib/python3.12/dist-packages/intel_extension_for_pytorch/utils/_proxy_module.py", line 2, in <module>
    import intel_extension_for_pytorch._C
ImportError: /opt/intel/compiler/2025.1/lib/libur_loader.so.0: version `LIBUR_LOADER_0.10' not found (required by /usr/local/lib/python3.12/dist-packages/intel_extension_for_pytorch/lib/../../../../libsycl.so.8)
garner@linux:~$

Pending pytorch issue; got this:

garner@linux:/opt/puffer$ puffer train puffer_target
Traceback (most recent call last):
  File "/usr/local/bin/puffer", line 5, in <module>
    from pufferlib.pufferl import main
  File "/opt/puffer/pufferlib/pufferl.py", line 28, in <module>
    import torch
  File "/usr/local/lib/python3.12/dist-packages/torch/__init__.py", line 409, in <module>
    from torch._C import *  # noqa: F403
    ^^^^^^^^^^^^^^^^^^^^^^
ImportError: /opt/intel/compiler/2025.1/lib/libur_loader.so.0: version `LIBUR_LOADER_0.10' not found (required by /usr/local/lib/python3.12/dist-packages/torch/lib/../../../../libsycl.so.8)
garner@linux:/opt/puffer$

Note if legacy hardware; Linux Mint has intel-opencl-icd (23.43.27642.40-1ubuntu3) at present instead of 24.35.

Is this as expected?

Processing triggers for libc-bin (2.39-0ubuntu8.4) ...
/sbin/ldconfig.real: /usr/local/lib/libccl.so.1 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libtbb.so.12 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libmpi.so.12 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libtbbbind_2_5.so.3 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libtbbbind.so.3 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libpti_view.so.0.10 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libmpijava.so.1 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libpstloffload.so.1 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libmpicxx.so.12 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libur_adapter_opencl.so.0 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libOpenCL.so.1 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libur_adapter_level_zero.so.0 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libtbbbind_2_0.so.3 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libtbbmalloc.so.2 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libsycl-preview.so.8 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libmpifort.so.12 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libur_loader.so.0 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libtbbmalloc_proxy.so.2 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libsycl.so.8 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libumf.so.0 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libhwloc.so.15 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libtcm.so.1 is not a symbolic link

/sbin/ldconfig.real: /usr/local/lib/libtcm_debug.so.1 is not a symbolic link

Possible libsycl issue as /opt/intel/compiler/2025.1/lib/libur_loader.so.0.11.10 exists. Would that be added here?

May 06 '25 15:05 elevatorguy

As for pufferlib - bbd22d - if starting with device = xpu in pufferlib/config/ocean/target.ini, linux shows AssertionError: Torch not compiled with XPU enabled which confirms the possibility. Officially without windows support as of 2.0; after pip install -e . --break-system-packages, getting LINK : error LNK2001: unresolved external symbol PyInit_ocean\target\binding with this merge commit. — elevatorguy

Didn't take note yesterday in windows, but somehow got past libur_loader as the blocker in linux.

 garner@linux:~$ puffer train puffer_squared
/home/garner/.local/lib/python3.12/site-packages/torch/xpu/__init__.py:120: UserWarning: XPU device count is zero! (Triggered internally at /pytorch/c10/xpu/XPUFunctions.cpp:115.)
  torch._C._xpu_init()
...
RuntimeError: No XPU devices are available.

The ... being a Traceback; used --user with pip when reinstalling intel's torch - uninstalled pufferlib yesterday first in linux.

Today, added AppData\Roaming\Python\Python313\Scripts to PATH; during reinstall of intel's torch in windows, followed a similar process to yesterday but didn't notice a path difference with pip's --user.

Merged in c951bfd here resulting in the same LINK error as above using python setup.py build_ext --inplace.

Needed set DISTUTILS_USE_SDK=1.

D:\puffer>"C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.43.34808\bin\HostX64\x64\link.exe" /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:C:\Users\jayg8\AppData\Local\Programs\Python\Python313\libs /LIBPATH:C:\Users\jayg8\AppData\Local\Programs\Python\Python313 /LIBPATH:C:\Users\jayg8\AppData\Local\Programs\Python\Python313\PCbuild\amd64 "/LIBPATH:C:\Program Files (x86)\Intel\oneAPI\tbb\latest\env\..\lib" "/LIBPATH:C:\Program Files (x86)\Intel\oneAPI\compiler\latest\lib\clang\19\lib\windows" "/LIBPATH:C:\Program Files (x86)\Intel\oneAPI\compiler\latest\opt\compiler\lib" "/LIBPATH:C:\Program Files (x86)\Intel\oneAPI\compiler\latest\lib" "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.43.34808\lib\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.22621.0\ucrt\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\\lib\10.0.22621.0\\um\x64" "/LIBPATH:C:\Program Files (x86)\Intel\oneAPI\umf\latest\lib" "/LIBPATH:C:\Program Files (x86)\Intel\oneAPI\tcm\latest\lib" /EXPORT:PyInit_binding build\temp.win-amd64-cpython-313\Release\pufferlib\ocean\school\binding.obj raylib-5.5_win64_msvc16/lib/raylibdll.lib /OUT:build\lib.win-amd64-cpython-313\pufferlib\ocean\school\binding.cp313-win_amd64.pyd /IMPLIB:build\temp.win-amd64-cpython-313\Release\pufferlib\ocean\school\binding.cp313-win_amd64.lib
   Creating library build\temp.win-amd64-cpython-313\Release\pufferlib\ocean\school\binding.cp313-win_amd64.lib and object build\temp.win-amd64-cpython-313\Release\pufferlib\ocean\school\binding.cp313-win_amd64.exp
Generating code
Finished generating code

To achieve linkage, some changes need to be made to setup.py as this fails:

"C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.43.34808\bin\HostX64\x64\link.exe" /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:C:\Users\jayg8\AppData\Local\Programs\Python\Python313\libs /LIBPATH:C:\Users\jayg8\AppData\Local\Programs\Python\Python313 /LIBPATH:C:\Users\jayg8\AppData\Local\Programs\Python\Python313\PCbuild\amd64 "/LIBPATH:C:\Program Files (x86)\Intel\oneAPI\tbb\latest\env\..\lib" "/LIBPATH:C:\Program Files (x86)\Intel\oneAPI\compiler\latest\lib\clang\19\lib\windows" "/LIBPATH:C:\Program Files (x86)\Intel\oneAPI\compiler\latest\opt\compiler\lib" "/LIBPATH:C:\Program Files (x86)\Intel\oneAPI\compiler\latest\lib" "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.43.34808\lib\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.22621.0\ucrt\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\\lib\10.0.22621.0\\um\x64" "/LIBPATH:C:\Program Files (x86)\Intel\oneAPI\umf\latest\lib" "/LIBPATH:C:\Program Files (x86)\Intel\oneAPI\tcm\latest\lib" /EXPORT:PyInit_ocean\school\binding build\temp.win-amd64-cpython-313\Release\pufferlib\ocean\school\binding.obj raylib-5.5_win64_msvc16/lib/raylibdll.lib /OUT:build\lib.win-amd64-cpython-313\pufferlib\ocean\school\binding.cp313-win_amd64.pyd /IMPLIB:build\temp.win-amd64-cpython-313\Release\pufferlib\ocean\school\binding.cp313-win_amd64.lib -fwrapv -O2

This setup.py is different already but further:

Inserting export_symbols=[ path.rstrip('.c').replace('/', '.').replace('\\','_') ], to Extension, python setup.py build_ext --inplace results in two errors of unresolved external symbol - PyInit_ocean\target\binding and pufferlib.ocean_target_binding in windows.

Just hardcoding the first parameter to "binding" results in successful linking; runtime error(s), though. ImportError: cannot import name 'binding' from 'pufferlib.ocean.target' (unknown location)

Jun 02 '25 16:06 elevatorguy

Neo requires:

Intel(R) Graphics Compiler for OpenCL(TM) Intel(R) Graphics Memory Management Please visit their repositories for building and instalation instructions. — Intel

Have yet to clone gmmlib and intel-graphics-compiler.

Starting from release 24.35.30872.22 regular packages support Gen12 and later devices.

Support for Gen8, Gen9 and Gen11 devices will be delivered via packages with legacy1 suffix:

intel-opencl-icd-legacy1_24.35.30872.22_amd64.deb intel-level-zero-gpu-legacy1_1.3.30872.22_amd64.deb — Intel

Without a non-zero XPU device count, may need to reinstall linux as both the non-legacy and legacy1 were installed, right? (used apt-get remove on one of the pairs)

dpkg --list intel* gives the following:

Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                                  Version                                    Architecture Description
+++-=====================================-==========================================-============-================================================================================
ii  intel-igc-core                        1.0.17537.20                               amd64        Intel(R) Graphics Compiler for OpenCL(TM)
ii  intel-igc-opencl                      1.0.17537.20                               amd64        Intel(R) Graphics Compiler for OpenCL(TM)
ii  intel-level-zero-gpu-legacy1          1.3.30872.22                               amd64        Intel(R) Graphics Compute Runtime for oneAPI Level Zero.
ii  intel-level-zero-gpu-legacy1-dbgsym   1.3.30872.22                               amd64        debug symbols for intel-level-zero-gpu-legacy1
ii  intel-media-va-driver:amd64           24.1.0+dfsg1-1                             amd64        VAAPI driver for the Intel GEN8+ Graphics family
un  intel-media-va-driver-non-free        <none>                                     <none>       (no description available)
ii  intel-microcode                       3.20250512.0ubuntu0.24.04.1                amd64        Processor microcode firmware for Intel CPUs
un  intel-opencl                          <none>                                     <none>       (no description available)
rc  intel-opencl-icd                      24.35.30872.22                             amd64        Intel graphics compute runtime for OpenCL
ii  intel-opencl-icd-legacy1              24.35.30872.22                             amd64        Intel graphics compute runtime for OpenCL
ii  intel-opencl-icd-legacy1-dbgsym       24.35.30872.22                             amd64        debug symbols for intel-opencl-icd-legacy1
garner@linux:/opt/neo$

Jun 11 '25 11:06 elevatorguy

I applaud the level of detail in this. We do not officially support windows with exclusion to using WSL/Docker. Running Linux will be the 'smoothest' experience by far. Is your intent to run using XPU? Sorry I haven't truly read through everything as I'm just seeing this and its late for me. I would also advise joining our Discord server and creating a thread in our Support channel as that has more visibility for us and you are more likely to receive help if this is a WIP.

Jun 11 '25 12:06 leanke

This was with stock torch - version 2.7.0.

With Intel's torch, the familiar RuntimeError: No XPU devices are available. occurs. As for puffer eval puffer_target --load-model-path latest: AssertionError: Torch not compiled with CUDA enabled. Unfamiliar error with puffer train puffer_target --train.device=cpu:

Windows fatal exception: code 0xc0000139

Thread 0x00001db0 (most recent call first):
  File "D:\puffer\pufferlib\pufferl.py", line 781 in run
  File "C:\Users\jayg8\AppData\Local\Programs\Python\Python313\Lib\threading.py", line 1041 in _bootstrap_inner
  File "C:\Users\jayg8\AppData\Local\Programs\Python\Python313\Lib\threading.py", line 1012 in _bootstrap

Current thread 0x0000266c (most recent call first):
  File "C:\Users\jayg8\AppData\Local\Programs\Python\Python313\Lib\site-packages\torch\autograd\graph.py", line 824 in _engine_run_backward
  File "C:\Users\jayg8\AppData\Local\Programs\Python\Python313\Lib\site-packages\torch\autograd\__init__.py", line 353 in backward
  File "C:\Users\jayg8\AppData\Local\Programs\Python\Python313\Lib\site-packages\torch\_tensor.py", line 648 in backward
  File "D:\puffer\pufferlib\pufferl.py", line 428 in train
  File "C:\Users\jayg8\AppData\Local\Programs\Python\Python313\Lib\site-packages\torch\distributed\elastic\multiprocessing\errors\__init__.py", line 355 in wrapper
  File "D:\puffer\pufferlib\pufferl.py", line 914 in train
  File "D:\puffer\pufferlib\pufferl.py", line 1203 in main
  File "C:\Users\jayg8\AppData\Local\Programs\Python\Python313\Scripts\puffer.exe\__main__.py", line 7 in <module>
  File "<frozen runpy>", line 88 in _run_code
  File "<frozen runpy>", line 198 in _run_module_as_main
C:\Users\jayg8\AppData\Local\Programs\Python\Python313\Lib\site-packages\torch\autograd\graph.py:824: UserWarning: XPU device count is zero! (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\pytorch\c10\xpu\XPUFunctions.cpp:115.)
  return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass

Changed to Python 3.12 in Windows due to reason(s) outside of this issue.

Jun 14 '25 22:06 elevatorguy