QPyTorch Enablement on ROCm

This PR contains the below two changes

forceinline needs inline and always_inline on ROCm
extra_include_paths is required for the hipification of quant_cuda header files.

Feb 12 '25 22:02 rraminen

Hi @Tiiiger, could you please review this PR?

Mar 07 '25 02:03 rraminen

hi @rraminen , +1 for this PR, I have one proposal, instead of adding definition of __forceinline__ we can add

diff --git a/qtorch/quant/quant_cuda/bit_helper.cu b/qtorch/quant/quant_cuda/bit_helper.cu
index 794255f..c741d58 100644
--- a/qtorch/quant/quant_cuda/bit_helper.cu
+++ b/qtorch/quant/quant_cuda/bit_helper.cu
@@ -1,3 +1,5 @@
+#include <cuda.h>
+
 #define FLOAT_TO_BITS(x) (*reinterpret_cast<unsigned int*>(x))
 #define BITS_TO_FLOAT(x) (*reinterpret_cast<float*>(x))

diff --git a/qtorch/quant/quant_cuda/sim_helper.cu b/qtorch/quant/quant_cuda/sim_helper.cu
index d165793..5a81493 100644
--- a/qtorch/quant/quant_cuda/sim_helper.cu
+++ b/qtorch/quant/quant_cuda/sim_helper.cu
@@ -1,3 +1,4 @@
+#include <cuda.h>
 #include "quant_kernel.h"
 #include <cmath>

in order to use this definition from hip library.

Jul 22 '25 13:07 k-artem

hi @stevenygd @Tiiiger could you please look at this PR? Thanks in advance.

Jul 22 '25 13:07 k-artem