[Build] choose_qparams_tensor_out get wrong return type cause build fail on native Windows
🐛 Describe the bug
When building on native Windows, I encountered an undefined symbol error.
lld-link : error : undefined symbol: __declspec(dllimport) class std::tuple<class at::Tensor &, class at::Tensor &> __cdecl torch::executor::native::choose_qparams_tensor_out(class at::Tensor const &, __int64, __int64, double, enum c10::ScalarType, class at::Tensor &, class at::Tensor &)
This issue can be worked around with the following patch.
diff --git a/kernels/quantized/cpu/op_choose_qparams.cpp b/kernels/quantized/cpu/op_choose_qparams.cpp
index 47f261407..9bda17192 100644
--- a/kernels/quantized/cpu/op_choose_qparams.cpp
+++ b/kernels/quantized/cpu/op_choose_qparams.cpp
@@ -149,7 +149,7 @@ void choose_qparams(
}
} // namespace
-std::tuple<Tensor, Tensor> choose_qparams_tensor_out(
+std::tuple<Tensor&, Tensor&> choose_qparams_tensor_out(
const Tensor& input,
int64_t quant_min,
int64_t quant_max,
@@ -164,7 +164,7 @@ std::tuple<Tensor, Tensor> choose_qparams_tensor_out(
return {scale_out, zero_point_out};
}
-::std::tuple<Tensor, Tensor> choose_qparams_tensor_out(
+::std::tuple<Tensor&, Tensor&> choose_qparams_tensor_out(
RuntimeContext& context,
const Tensor& input,
int64_t quant_min,
Versions
Collecting environment information... PyTorch version: 2.5.0.dev20240716+cpu Is debug build: False CUDA used to build PyTorch: Could not collect ROCM used to build PyTorch: N/A
OS: Microsoft Windows 11 Pro GCC version: Could not collect Clang version: 18.1.8 CMake version: version 3.30.2 Libc version: N/A
Python version: 3.10.0 | packaged by conda-forge | (default, Nov 10 2021, 13:20:59) [MSC v.1916 64 bit (AMD64)] (64-bit runtime) Python platform: Windows-10-10.0.22631-SP0 Is CUDA available: False CUDA runtime version: 12.2.140 CUDA_MODULE_LOADING set to: N/A GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3070 Ti Nvidia driver version: 551.76 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True
CPU: Architecture=9 CurrentClockSpeed=3501 DeviceID=CPU0 Family=107 L2CacheSize=16384 L2CacheSpeed= Manufacturer=AuthenticAMD MaxClockSpeed=3501 Name=AMD Ryzen Threadripper PRO 3975WX 32-Cores ProcessorType=3 Revision=12544
Versions of relevant libraries: [pip3] executorch==0.4.0a0+a70d070 [pip3] numpy==1.21.3 [pip3] torch==2.5.0.dev20240716+cpu [pip3] torchaudio==2.4.0.dev20240716+cpu [pip3] torchsr==1.0.4 [pip3] torchvision==0.20.0.dev20240716+cpu [conda] executorch 0.4.0a0+a70d070 pypi_0 pypi [conda] numpy 1.21.3 pypi_0 pypi [conda] torch 2.5.0.dev20240716+cpu pypi_0 pypi [conda] torchaudio 2.4.0.dev20240716+cpu pypi_0 pypi [conda] torchsr 1.0.4 pypi_0 pypi [conda] torchvision 0.20.0.dev20240716+cpu pypi_0 pypi