hiop Fix segfault, remove nonsymmetric ginkgo solver

This PR should fix issues #540 and #542. It also removes the nonsymmetric ginkgo solver as previously suggested by @pelesh.

The problem were a missing host side copy of the rhs vector for #540 and directly accessing matrix and vector values located in GPU memory in update_matrix for #542. This is fine as long as they are in unified memory, which the apparently are when built with Debug. I added a host side copy of the matrix that gets updated and moved back to the GPU instead.

Sep 22 '22 10:09 fritzgoebel

I think this is relevant for other GPU solvers in HiOp.

@kswirydo @nychiang

Sep 22 '22 14:09 pelesh

I think this is relevant for other GPU solvers in HiOp.

@kswirydo @nychiang

Not sure it actually is, the GPU memory was coming from inside a ginkgo matrix

Sep 22 '22 14:09 fritzgoebel

PNNL Pipelines seem to be giving a false negative here. Tests failed initially, but are passing upon a retry. I would suggest adding a new commit with https://github.com/LLNL/hiop/blob/develop/.gitlab-ci.yml#L159 changed to not include -E NlpSparse..., and we can see if the failing tests are now passing!

Thankfully this doesn't require a rebuild, so this should be a quick check.

Sep 27 '22 17:09 cameronrutherford

Tests are passing on ascent and summit with HiOp built in Release mode. I think this fixes #542.

Oct 31 '22 16:10 nkoukpaizan

Seems like CUDA_ARCHITECTURES cannot readily be passed from the CLI in the platform variables script into the CMake config. As a workaround, I have specified all Cuda architectures for Marianas in gcc-cuda.cmake. The only downside is that now all CI platforms/developers that use this config will by default now build with all the architectures, rather than only on Marianas as would be desirable.

Nov 08 '22 16:11 cameronrutherford

I was able to get tests passing when running manually on previously failing platform.

I am not able to re-produce old issue with updated module set. I assume that since ginkgo is built with the fully array of cuda architectures, that the CUDA runtime is smart enough to pick up the right spec, despite HiOp being built with the "wrong" cuda arch.

Unless people have qualms with how I hacked this together, I think this should be good to merge.

As a side note, ExaGO now has updated modules ready to go with a fresh build of latest hiop@develop.

Nov 08 '22 16:11 cameronrutherford

Thanks @CameronRutherford!

Nov 08 '22 16:11 fritzgoebel

The only downside is that now all CI platforms/developers that use this config will by default now build with all the architectures, rather than only on Marianas as would be desirable.

This is not a big issue. If you want to distribute HiOp binary, for example, this is exactly how you would build it.

Nov 08 '22 16:11 pelesh

All tests passing in CI

Nov 08 '22 16:11 cameronrutherford

Passing spack builds in GitHub actions!

Nov 08 '22 17:11 cameronrutherford