Fix segfault, remove nonsymmetric ginkgo solver
This PR should fix issues #540 and #542. It also removes the nonsymmetric ginkgo solver as previously suggested by @pelesh.
The problem were a missing host side copy of the rhs vector for #540 and directly accessing matrix and vector values located in GPU memory in update_matrix for
#542. This is fine as long as they are in unified memory, which the apparently are when built with Debug. I added a host side copy of the matrix that gets updated and moved back to the GPU instead.
I think this is relevant for other GPU solvers in HiOp.
@kswirydo @nychiang
I think this is relevant for other GPU solvers in HiOp.
@kswirydo @nychiang
Not sure it actually is, the GPU memory was coming from inside a ginkgo matrix
PNNL Pipelines seem to be giving a false negative here. Tests failed initially, but are passing upon a retry. I would suggest adding a new commit with https://github.com/LLNL/hiop/blob/develop/.gitlab-ci.yml#L159 changed to not include -E NlpSparse..., and we can see if the failing tests are now passing!
Thankfully this doesn't require a rebuild, so this should be a quick check.
Tests are passing on ascent and summit with HiOp built in Release mode. I think this fixes #542.
Seems like CUDA_ARCHITECTURES cannot readily be passed from the CLI in the platform variables script into the CMake config. As a workaround, I have specified all Cuda architectures for Marianas in gcc-cuda.cmake. The only downside is that now all CI platforms/developers that use this config will by default now build with all the architectures, rather than only on Marianas as would be desirable.
I was able to get tests passing when running manually on previously failing platform.
I am not able to re-produce old issue with updated module set. I assume that since ginkgo is built with the fully array of cuda architectures, that the CUDA runtime is smart enough to pick up the right spec, despite HiOp being built with the "wrong" cuda arch.
Unless people have qualms with how I hacked this together, I think this should be good to merge.
As a side note, ExaGO now has updated modules ready to go with a fresh build of latest hiop@develop.
Thanks @CameronRutherford!
The only downside is that now all CI platforms/developers that use this config will by default now build with all the architectures, rather than only on Marianas as would be desirable.
This is not a big issue. If you want to distribute HiOp binary, for example, this is exactly how you would build it.
All tests passing in CI
Passing spack builds in GitHub actions!