nschaeff
nschaeff
Hello, When compiling on an IBM power9, nvcc fails with the following message: > /gpfs/apps/POWER9/GCC/8.3.0/lib/gcc/ppc64le-redhat-linux/8.3.0/include/c++/type_traits(335): error: identifier "__ieee128" is undefined passing option `-std=c++11` to nvcc solves the issue. Maybe this...
Provides full diff (all files) between two commits for git and mercurial. Usage: diffuse -c commit1..commit2 # displays the diff between commi1 and commit2 Provides a partial answer to issue...
For instance, within the diffuse main git directory: ``` diffuse -c 613381 # shows the modified file as expected, which is in the current working dir cd src diffuse -c...
Hello, I observed slower execution time on MI250X than on MI100, for "strided" transforms. Example for Nfft = 1024, and 20480 batched complex to complex transforms (double precision). time on...
I suggest replacing the current implementation found here: https://github.com/JuliaMath/DoubleFloats.jl/blob/c085185654507a6e73b6568a01a5c26fc47c0e0d/src/math/ops/op_dd_dd.jl#L59 with the following (pseudo-code), which is both much faster and a bit more accurate, thanks to fma: ```julia function sqrt_dd_dd(x::Tuple{T,T}) where...
Hello, Using HIP, I currently perform cross-lane reductions with `__shfl_xor()` instructions. These seem to be a perfect match for the ds_swizzle_b32 instruction [1], but when looking at the assembly generated,...
Hello, For small (1D) transforms that are part of a more complex workflow, it would be interesting to be able to call VkFFT from a kernel on data already residing...
In some cases, for large sizes (more below), using a R2C and C2R transform, I obtain wrong FFT results with the zeroPadding feature in frequency domain. If I remove the...