pycuda icon indicating copy to clipboard operation
pycuda copied to clipboard

CUDA 8 - Unified Memory - Pascal

Open deeplearning-ai-research opened this issue 9 years ago • 2 comments

Hi,

There is unified memory for Pascal card (1060 GTX is at 250usd...), in the CUDA 8 API, there is no need to do memory transfer and latency is very low.

Is there any plan to map Cuda 8 functionality of unified memory ? It would significant boost in performance.

Sample code is here: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-simplifying

//C++ Cuda code to  Allocate 32GB on GPU, using CPU RAM (very low latency with nvlink). 
void allocate_model() {
char *data;
size_t size = 32*1024*1024*1024;
cudaMallocManaged(&data, size);
}

I'd be happy to take a patch.

inducer avatar Aug 06 '16 18:08 inducer

Hello, Sounds, it would give a major boost of performance, give no need to do allocation of memory of both sides.

Here, functionnality of CUDA 8 (only Pascal) with samples C++ in the package. https://developer.nvidia.com/cuda-toolkit

(Pascal 1060 GTX is under 300usd), this would be very useful.

arita37 avatar Aug 06 '16 21:08 arita37