devito compiler: Unified Memory Allocator

Hello everyone,

We of SENAI CIMATEC are working in a Unified Memory Allocator to Devito trough CuPy library.

The first results using this new allocator have impressive results when using checkpointing compared to default allocator in GPU.

The impact of performance in our experiment using Overthrust (894x884x299) are close to three times* compared to default allocator in Devito.

With this approach we expect to be able to allocate memory beyond the GPU capacity in the future.

We will open a Draft PR to standardize this allocator with Devito patterns, fix possible bugs and open to community use :)

A version enabling this through External Allocator is in development too and we expect share this soon.

All feedbacks are welcome :)

Thank you all CIMATEC and Devito team, to make this possible :slightly_smiling_face:

*All experiments were using Nvidia V100 with 32 GB of memory.

Oct 21 '22 14:10 guaacoelho

It was used the version 8.3.0 of Cupy

Oct 21 '22 14:10 guaacoelho

ToDo:

Distributed GPU allocation

Oct 21 '22 14:10 speglich

@FabioLuporini, we added tests to this allocator, could we proceed with this PR?

Feb 14 '23 17:02 speglich

@FabioLuporini, we added tests to this allocator, could we proceed with this PR?

The issue is that CuPy is an optional dependency (shipped by the NVidia SDK I'm guessing?), so if I try to run CI here, it will undoubtedly break.

So, we need conditional imports, a mechanism like this to emit suitable error messages in case one attempts to use the allocator but the allocator isn't available, etc etc

Feb 15 '23 08:02 FabioLuporini

Should this be added to the GPU CI and actually tested on the nvidia run?

Mar 08 '23 14:03 mloubout

@FabioLuporini, we added tests to this allocator, could we proceed with this PR?

The issue is that CuPy is an optional dependency (shipped by the NVidia SDK I'm guessing?), so if I try to run CI here, it will undoubtedly break.

So, we need conditional imports, a mechanism like this to emit suitable error messages in case one attempts to use the allocator but the allocator isn't available, etc etc

We've updated the allocators.py file to use conditional import for cupy. I think this solves this problem.

Mar 08 '23 14:03 guaacoelho

Can you check if #2171 fixes the install issue

Jul 28 '23 12:07 mloubout

@mloubout, definitely, we will test it. BTW, this docker image worked too, thanks to you.


FROM nvcr.io/nvidia/nvhpc:23.5-devel-cuda12.1-ubuntu22.04 AS devel

RUN apt-get update -y && \
    DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
        git \
        make \
        wget && \
    rm -rf /var/lib/apt/lists/*

RUN apt-get update -y && \
    DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
        python3 \
        python3-dev \
        python3-pip \
        python3-setuptools \
        python3-wheel \
        python3.10-venv && \
    rm -rf /var/lib/apt/lists/*

RUN pip3 --no-cache-dir install cupy-cuda12x

Jul 28 '23 12:07 speglich

Ok the nvidia setup has been updated so please:

rebase and answer comments
add the test to the nvidia test suite For the second point you may have to modify the pytest-gpu.yml workflow file to add that extra test to it

Aug 02 '23 13:08 mloubout