Justin Luitjens

Results 29 comments of Justin Luitjens

From the CUDA 10.1 release notes: https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#cublas-u1-new-features 3.3.1. cuBLAS Library With this release, on Linux systems, the cuBLAS libraries listed below are now installed in the /usr/lib/-linux-gnu/ or /usr/lib64/ directories...

Switching to DeviceSegmentedSort from DeviceSegmentedRadix solve this.

one fix would be to make owning/non-owning a runtime property and not a compile time property.

fixed in 7397f0103af66d72e521a0aa547d056aa120419c

I could use a good review of this. In particular some of the scalar derivatives have not been tested.

closing for now. this was a proof of concept.

Standard seems to indicate it is conforming and the checks are unnecessary: "If the result of a function is not mathematically defined or not in the range of representable values...

Ran a quick experiment where I modified 11.5 libcu++. Modifications were to fix double cast and comment out isnan checks. This data indicates the isnan checks are still very expensive...

Here is analysis that kicked up this issue: TEMS_PER_MEDIUM_THREAD impacts performance in a couple of ways. 1) if problem size is less than WARP_SIZE * ITEMS_PER_MEDIUM_THREAD the algorithm uses a...

That explains where it came from but doesn't explain why we keep it in.