Aomp post 13-0.2 reported to crash on out of memory
Reported out of band by Devito a week ago, creating a ticket to help avoid it getting lost. Quoting from the slack channel,
Devito observed spurious segfaults while running a sequence of tests. This appears to be due to exhausting host memory, and since it's not swapping out, I'd bet on pinned memory buffers allocated by the openmp runtime for the comms
DEVITO_PLATFORM=amdgpuX DEVITO_ARCH=aomp DEVITO_LANGUAGE=openmp pytest test_gpu_common.py::TestStreaming::test_streaming_complete
shows a significant increase in host memory consumption (~1 GB after each Operator run), despite being a tiny test
memory allocated and freed as expected #pragma omp target enter data map(to: u[0:u_vec->size[0]][0:u_vec->size[1]][0:u_vec->size[2]]) #pragma omp target enter data map(to: va[0:va_vec->size[0]][0:va_vec->size[1]][0:va_vec->size[2]]) #pragma omp target enter data map(to: vb[0:vb_vec->size[0]][0:vb_vec->size[1]][0:vb_vec->size[2]]) and (deallocations): #pragma omp target update from(u[0:u_vec->size[0]][0:u_vec->size[1]][0:u_vec->size[2]]) #pragma omp target exit data map(release: u[0:u_vec->size[0]][0:u_vec->size[1]][0:u_vec->size[2]]) if(devicerm) #pragma omp target exit data map(delete: va[0:va_vec->size[0]][0:va_vec->size[1]][0:va_vec->size[2]]) if(devicerm && (va_vec->size[0] != 0) && (va_vec->size[1] != 0) && (va_vec->size[2] != 0)) #pragma omp target exit data map(delete: vb[0:vb_vec->size[0]][0:vb_vec->size[1]][0:vb_vec->size[2]]) if(devicerm && (vb_vec->size[0] != 0) && (vb_vec->size[1] != 0) && (vb_vec->size[2] != 0))
htop confirmed crash upon memory exhaustion
13.0-3 leaks too (!)
so I'm reverting to 13.0-2 for the time being, I will wait for news...
I think the easiest thing for you to do is to run pytest test_adjoint.py test_gpu_common.py with AOMP offloading enabled and see for yourself. You'll see a massive and steady increase in unswappable, host memory consumption
@estewart08 this would be a good thing to not be broken on the 14 proposal
Initial testing shows passing for me with 13.0-2, 13.0-6 and 14.0-0 with 64 GB system memory and a gfx906. devito sha: 5848cf3d0d0854b158cbd3245df30ef52ad24146
closing this old issue..