aomp icon indicating copy to clipboard operation
aomp copied to clipboard

Aomp post 13-0.2 reported to crash on out of memory

Open JonChesterfield opened this issue 4 years ago • 2 comments

Reported out of band by Devito a week ago, creating a ticket to help avoid it getting lost. Quoting from the slack channel,

Devito observed spurious segfaults while running a sequence of tests. This appears to be due to exhausting host memory, and since it's not swapping out, I'd bet on pinned memory buffers allocated by the openmp runtime for the comms

DEVITO_PLATFORM=amdgpuX DEVITO_ARCH=aomp DEVITO_LANGUAGE=openmp pytest test_gpu_common.py::TestStreaming::test_streaming_complete

shows a significant increase in host memory consumption (~1 GB after each Operator run), despite being a tiny test

memory allocated and freed as expected #pragma omp target enter data map(to: u[0:u_vec->size[0]][0:u_vec->size[1]][0:u_vec->size[2]]) #pragma omp target enter data map(to: va[0:va_vec->size[0]][0:va_vec->size[1]][0:va_vec->size[2]]) #pragma omp target enter data map(to: vb[0:vb_vec->size[0]][0:vb_vec->size[1]][0:vb_vec->size[2]]) and (deallocations): #pragma omp target update from(u[0:u_vec->size[0]][0:u_vec->size[1]][0:u_vec->size[2]]) #pragma omp target exit data map(release: u[0:u_vec->size[0]][0:u_vec->size[1]][0:u_vec->size[2]]) if(devicerm) #pragma omp target exit data map(delete: va[0:va_vec->size[0]][0:va_vec->size[1]][0:va_vec->size[2]]) if(devicerm && (va_vec->size[0] != 0) && (va_vec->size[1] != 0) && (va_vec->size[2] != 0)) #pragma omp target exit data map(delete: vb[0:vb_vec->size[0]][0:vb_vec->size[1]][0:vb_vec->size[2]]) if(devicerm && (vb_vec->size[0] != 0) && (vb_vec->size[1] != 0) && (vb_vec->size[2] != 0))

htop confirmed crash upon memory exhaustion

13.0-3 leaks too (!)

so I'm reverting to 13.0-2 for the time being, I will wait for news...

I think the easiest thing for you to do is to run pytest test_adjoint.py test_gpu_common.py with AOMP offloading enabled and see for yourself. You'll see a massive and steady increase in unswappable, host memory consumption

JonChesterfield avatar Nov 15 '21 14:11 JonChesterfield

@estewart08 this would be a good thing to not be broken on the 14 proposal

JonChesterfield avatar Nov 15 '21 14:11 JonChesterfield

Initial testing shows passing for me with 13.0-2, 13.0-6 and 14.0-0 with 64 GB system memory and a gfx906. devito sha: 5848cf3d0d0854b158cbd3245df30ef52ad24146

estewart08 avatar Nov 15 '21 14:11 estewart08

closing this old issue..

gregrodgers avatar Oct 18 '22 20:10 gregrodgers