devito icon indicating copy to clipboard operation
devito copied to clipboard

compiler: Unified Memory Allocator

Open guaacoelho opened this issue 3 years ago • 9 comments

Hello everyone,

We of SENAI CIMATEC are working in a Unified Memory Allocator to Devito trough CuPy library.

The first results using this new allocator have impressive results when using checkpointing compared to default allocator in GPU.

The impact of performance in our experiment using Overthrust (894x884x299) are close to three times* compared to default allocator in Devito.

With this approach we expect to be able to allocate memory beyond the GPU capacity in the future.

We will open a Draft PR to standardize this allocator with Devito patterns, fix possible bugs and open to community use :)

A version enabling this through External Allocator is in development too and we expect share this soon.

All feedbacks are welcome :)

Thank you all CIMATEC and Devito team, to make this possible :slightly_smiling_face:

*All experiments were using Nvidia V100 with 32 GB of memory.

guaacoelho avatar Oct 21 '22 14:10 guaacoelho

It was used the version 8.3.0 of Cupy

guaacoelho avatar Oct 21 '22 14:10 guaacoelho

ToDo:

  • Distributed GPU allocation

speglich avatar Oct 21 '22 14:10 speglich

@FabioLuporini, we added tests to this allocator, could we proceed with this PR?

speglich avatar Feb 14 '23 17:02 speglich

@FabioLuporini, we added tests to this allocator, could we proceed with this PR?

The issue is that CuPy is an optional dependency (shipped by the NVidia SDK I'm guessing?), so if I try to run CI here, it will undoubtedly break.

So, we need conditional imports, a mechanism like this to emit suitable error messages in case one attempts to use the allocator but the allocator isn't available, etc etc

FabioLuporini avatar Feb 15 '23 08:02 FabioLuporini

Should this be added to the GPU CI and actually tested on the nvidia run?

mloubout avatar Mar 08 '23 14:03 mloubout

@FabioLuporini, we added tests to this allocator, could we proceed with this PR?

The issue is that CuPy is an optional dependency (shipped by the NVidia SDK I'm guessing?), so if I try to run CI here, it will undoubtedly break.

So, we need conditional imports, a mechanism like this to emit suitable error messages in case one attempts to use the allocator but the allocator isn't available, etc etc

We've updated the allocators.py file to use conditional import for cupy. I think this solves this problem.

guaacoelho avatar Mar 08 '23 14:03 guaacoelho

Can you check if #2171 fixes the install issue

mloubout avatar Jul 28 '23 12:07 mloubout

@mloubout, definitely, we will test it. BTW, this docker image worked too, thanks to you.


FROM nvcr.io/nvidia/nvhpc:23.5-devel-cuda12.1-ubuntu22.04 AS devel

RUN apt-get update -y && \
    DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
        git \
        make \
        wget && \
    rm -rf /var/lib/apt/lists/*

RUN apt-get update -y && \
    DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
        python3 \
        python3-dev \
        python3-pip \
        python3-setuptools \
        python3-wheel \
        python3.10-venv && \
    rm -rf /var/lib/apt/lists/*

RUN pip3 --no-cache-dir install cupy-cuda12x

speglich avatar Jul 28 '23 12:07 speglich

Ok the nvidia setup has been updated so please:

  • rebase and answer comments
  • add the test to the nvidia test suite For the second point you may have to modify the pytest-gpu.yml workflow file to add that extra test to it

mloubout avatar Aug 02 '23 13:08 mloubout