hiop icon indicating copy to clipboard operation
hiop copied to clipboard

Update ginkgo interface

Open fritzgoebel opened this issue 2 years ago • 6 comments

This PR updates the Ginkgo interface in order to be able to handle data coming from the GPU. Currently this means:

  • in the first solver call: Copying matrices to the CPU, transform to CSR and start factorization preprocessing on the host before moving to the GPU
  • if the matrix is updated: copying the matrix values to the CPU, move them to the CSR format and then refactorize on the GPU (this should be all moved to the GPU)
  • in the solve: handle vectors on the GPU if possible, if they are on the CPU move them to the GPU for the solve

I would be grateful for some instructions on how to test the mem_space == device @pelesh @nkoukpaizan

Note: I will update the interface further to be based on the most current Ginkgo release (1.7.0)

fritzgoebel avatar Oct 17 '23 19:10 fritzgoebel

@fritzgoebel, all you need to do is to set GPU examples to use gpu mode. In sparse HiOp examples that would look something like this:

      if (use_ginkgo_cuda) {
          nlp.options->SetStringValue("compute_mode", "gpu");
          nlp.options->SetStringValue("ginkgo_exec", "cuda");
      } else if (use_ginkgo_hip) {
          nlp.options->SetStringValue("compute_mode", "gpu");
          nlp.options->SetStringValue("ginkgo_exec", "hip");
      } else {
          nlp.options->SetStringValue("ginkgo_exec", "reference");
      }

I think all the other options you set earlier should stay the same.

When you set compute_mode to gpu, HiOp will hand data to the linear solver on the device.

pelesh avatar Oct 17 '23 19:10 pelesh

I added ginkgo as an option for the NlpSparseRajaEx2 example and tested that this works with "mem_space" = "device" both on Summit and on Frontier. @pelesh

fritzgoebel avatar Nov 02 '23 19:11 fritzgoebel

I successfully tested this on Frontier with ginkgo@glu_experimental built with rocm/5.2. I haven't been able to build ginkgo@glu_experimental with more recent versions of ROCm. @fritzgoebel correct me if I'm wrong, but the interface will need to be changed again to use [email protected] (assuming that version has everything we need for HiOp).

nkoukpaizan avatar Nov 29 '23 19:11 nkoukpaizan

@nkoukpaizan what is the error you observed when using a more recent version of ROCm?

nychiang avatar Dec 11 '23 17:12 nychiang

If LLNL devs / HiOp devs are happy with this PR, can we please get a PR into develop from a local branch (instead of a fork) so we can get CI running? CI should be failing here since we need a [email protected] module on CI platforms, and so testing with that would be great.

This also would make merging updates into ExaGO easier...

I can create the PR myself as well

cameronrutherford avatar Dec 12 '23 18:12 cameronrutherford

If LLNL devs / HiOp devs are happy with this PR, can we please get a PR into develop from a local branch (instead of a fork) so we can get CI running? CI should be failing here since we need a [email protected] module on CI platforms, and so testing with that would be great.

This also would make merging updates into ExaGO easier...

I can create the PR myself as well

@cameronrutherford @fritzgoebel Thanks! Please create a PR from a local branch (instead of a fork), in order to use the CI features. Otherwise this PR looks good to me.

nychiang avatar Dec 12 '23 19:12 nychiang