Make some computations in DFTK GPU-compatible

Open GVigne opened this issue 3 years ago • 0 comments

This PR is a followup of this one, which implements GPU compatibility for LOBPCG. If you have any questions/remarks as to how LOBPCG works, please refer to this other PR.

The goal of the following PR is to implement GPU compatibility for some computations made by DFTK. This mainly means modifying the PlaneWaveBasis so it can store GPUArrays, and extending the apply! functions to allow the Hamiltonian and its operators to be applied to GPUArrays.

From an end user perspective, the only thing that changes is when he builds the basis. There is now an optional argument array_type which tells the code which type of array structure should be used. For example :

basis = PlaneWaveBasis(model; Ecut=30, kgrid=(1, 1, 1)) # Computations will happen on CPU
basis_gpu = PlaneWaveBasis(model; Ecut=30, kgrid=(1, 1, 1), array_type = CuArray) #Computations will happen on GPU using CUDA

The end-user can then call the SCF with either basis or basis_gpu.

I used CUDA since I have an NVIDIA GPU, but this part of the code should also work with other GPUs, since I did not use any CUDA-specific function.

Things that I already know could be greatly improved:

The preconditionners. For now they work, but two things could be done. The first one is offload mean_kin to the GPU. That would require some work, as it means we would have to rewrite ldiv! and mul! (which right now does a lot of scalar indexing). ~~The other thing would be to build kin directly on the GPU instead of building it on CPU then offloading it (which is currently being done). In order to do this, we would need Gplusk_vectors_cart to return a GPUArray: this means that G_vectors(basis, kpoint) should return a GPUArray, ie that kpt.G_vectors should be on GPU. And this is going to be quite hard, as it means that we would have to rewrite every function calling G_vectors(basis, kpoint) to be GPU-compatible.~~
The solvers. I didn't manage to make the NLSolve solvers work, so I had to use the ones implementend in DFTK. scf_damping_solver works fine, but not scf_anderson_solver as I didn't manage to write it in a GPU-compatible way.
The terms. I have implemented the "easy" terms (Kinetic, AtomicLocal, AtomicNonlocal and Hartree), but we could add the Magnetic, PairWisePotential, Anyonic and XC terms. These terms will be much harder to implement, as they either vastly use scalar indexing or rely on other librairies (libxc).

Edit: Two big changes:

The G_vectors for each kpoint and the occupation have been offloaded to the GPU. However, to do this, some functions (like compute_density) have to bring those arrays back on the CPU, as they do scalar indexing. This could be improved if it is performance-critical.
The Anderson solver will work in CUDA once this bug has been solved. We will only have to allow scalar indexing on βs, which should really be fine as it isn't a big vector.

Aug 24 '22 07:08 GVigne