compiler: Unified Memory Allocator
Hello everyone,
We of SENAI CIMATEC are working in a Unified Memory Allocator to Devito trough CuPy library.
The first results using this new allocator have impressive results when using checkpointing compared to default allocator in GPU.
The impact of performance in our experiment using Overthrust (894x884x299) are close to three times* compared to default allocator in Devito.
With this approach we expect to be able to allocate memory beyond the GPU capacity in the future.
We will open a Draft PR to standardize this allocator with Devito patterns, fix possible bugs and open to community use :)
A version enabling this through External Allocator is in development too and we expect share this soon.
All feedbacks are welcome :)
Thank you all CIMATEC and Devito team, to make this possible :slightly_smiling_face:
*All experiments were using Nvidia V100 with 32 GB of memory.
It was used the version 8.3.0 of Cupy
ToDo:
- Distributed GPU allocation
@FabioLuporini, we added tests to this allocator, could we proceed with this PR?
@FabioLuporini, we added tests to this allocator, could we proceed with this PR?
The issue is that CuPy is an optional dependency (shipped by the NVidia SDK I'm guessing?), so if I try to run CI here, it will undoubtedly break.
So, we need conditional imports, a mechanism like this to emit suitable error messages in case one attempts to use the allocator but the allocator isn't available, etc etc
Should this be added to the GPU CI and actually tested on the nvidia run?
@FabioLuporini, we added tests to this allocator, could we proceed with this PR?
The issue is that CuPy is an optional dependency (shipped by the NVidia SDK I'm guessing?), so if I try to run CI here, it will undoubtedly break.
So, we need conditional imports, a mechanism like this to emit suitable error messages in case one attempts to use the allocator but the allocator isn't available, etc etc
We've updated the allocators.py file to use conditional import for cupy. I think this solves this problem.
Can you check if #2171 fixes the install issue
@mloubout, definitely, we will test it. BTW, this docker image worked too, thanks to you.
FROM nvcr.io/nvidia/nvhpc:23.5-devel-cuda12.1-ubuntu22.04 AS devel
RUN apt-get update -y && \
DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
git \
make \
wget && \
rm -rf /var/lib/apt/lists/*
RUN apt-get update -y && \
DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
python3 \
python3-dev \
python3-pip \
python3-setuptools \
python3-wheel \
python3.10-venv && \
rm -rf /var/lib/apt/lists/*
RUN pip3 --no-cache-dir install cupy-cuda12x
Ok the nvidia setup has been updated so please:
- rebase and answer comments
- add the test to the nvidia test suite
For the second point you may have to modify the
pytest-gpu.ymlworkflow file to add that extra test to it