Justin Luitjens
Justin Luitjens
From the CUDA 10.1 release notes: https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#cublas-u1-new-features 3.3.1. cuBLAS Library With this release, on Linux systems, the cuBLAS libraries listed below are now installed in the /usr/lib/-linux-gnu/ or /usr/lib64/ directories...
Switching to DeviceSegmentedSort from DeviceSegmentedRadix solve this.
one fix would be to make owning/non-owning a runtime property and not a compile time property.
fixed in 7397f0103af66d72e521a0aa547d056aa120419c
I could use a good review of this. In particular some of the scalar derivatives have not been tested.
closing for now. this was a proof of concept.
Standard seems to indicate it is conforming and the checks are unnecessary: "If the result of a function is not mathematically defined or not in the range of representable values...
Ran a quick experiment where I modified 11.5 libcu++. Modifications were to fix double cast and comment out isnan checks. This data indicates the isnan checks are still very expensive...
Here is analysis that kicked up this issue: TEMS_PER_MEDIUM_THREAD impacts performance in a couple of ways. 1) if problem size is less than WARP_SIZE * ITEMS_PER_MEDIUM_THREAD the algorithm uses a...
That explains where it came from but doesn't explain why we keep it in.