Improve OpenMP offload implementation

Open olupton opened this issue 4 years ago • 0 comments

Overview In https://github.com/BlueBrain/CoreNeuron/pull/713 we have added support for GPU offload using OpenMP. This is a good first step, but there are several areas where we hope to improve the implementation. This issue is to track planned improvements.

Asynchronous execution In https://github.com/BlueBrain/CoreNeuron/pull/713 we did not include any asynchronous execution clauses for OpenMP-based accelerator offload (nowait, depend, taskwait). This was partly for simplicity, and partly because support for those clauses in the compiler we were using at the time (NVHPC 21.9) is rather limited.

Work has already started on this, see:

https://github.com/BlueBrain/CoreNeuron/pull/725
https://github.com/BlueBrain/nmodl/pull/788
https://github.com/BlueBrain/mod2c/pull/75

Initially we should aim to recover the performance attained with (asynchronous) OpenACC. After that, we could look at launching more mechanism kernels in parallel within a single NrnThread.

Present clauses With OpenACC we had present(...) clauses that allowed us to assert that data were already present on the device and should not be copied. The current OpenMP implementation has no such equivalent, but we basically preserve the same data transfer pattern as OpenACC because we ensure that the data are already present.

In principle a bug in the model transfer code (leading to some relevant data not being transferred to the device during initialisation) would cause a runtime error with OpenACC (✅) and implicit data transfers with OpenMP (⛔). Given that we already know how to generate present() clauses, it seems desirable to add the OpenMP equivalent (map(present, alloc: ...)) once it is widely supported.

(original issue: https://github.com/neuronsimulator/gpuhackathon/issues/5)

Dec 23 '21 08:12 olupton