hypre icon indicating copy to clipboard operation
hypre copied to clipboard

Illegal memory access (double free?) on large GPU case

Open MalachiTimothyPhillips opened this issue 3 years ago • 1 comments

Hello all,

I encountered an issue during setup using a HYPRE version based on this PR: https://github.com/hypre-space/hypre/pull/702. (I believe it's a slightly older commit than the newest one on the branch, see the diffs.txt for differences in src/utilities/memory.c, e.g. diffs.txt)

I ran into an illegal memory access during HYPRE setup for a large case (> 2.2B dofs) on Summit with P=3,072 V100s. Specifically:

CUDA ERROR (code = 700, an illegal memory access was encountered) at <path-to-hypre>/hypre/src/utilities/memory.c:442

This corresponds to: https://github.com/hypre-space/hypre/blob/e20d63ad95d4ea35e44b828f7c5ade8cd908e146/src/utilities/memory.c#L469. Unfortunately, I am not able to reproduce this issue with problems smaller than 2.2B dofs. My guess is this is a double free?

I am utilizing the configuration from: https://github.com/Nek5000/nekRS/blob/f7ecfa9db96e8476f249bba0f0eb4f8af2e612ac/config/hypre.cmake#L52. -DHYPRE_ENABLE_MIXEDINT=ON is set, but I wonder if there is a 32-bit integer overflow?

Please let me know if you need additional information -- I can give more details regarding how to reproduce the issue. Thank you for your help!

MalachiTimothyPhillips avatar Dec 06 '22 23:12 MalachiTimothyPhillips

Hi @MalachiTimothyPhillips , Do you see this problem in master or the latest release? PR #702 doesn't change memory.c. I am confused with the double free issue. A reproducer will be helpful. Thanks!

liruipeng avatar Dec 09 '22 16:12 liruipeng