Add opm cuilu0 and clean up cuistl
Adds our own implementation of ILU0 in cuda using an implementation similar to that of CuDILU. The implementation is verified to work on AMD cards as well.
The PR also cleanes up some files like cusparsematrixoperations that were growing too large. A new folder with separate files for the kernels of the different preconditioners is created.
A following PR (#5451) will avoid the introduced code duplication of the thread block tuning step that for now is present in both the CuDILU and CuILU0_OPM_Impl cpp files and improve it by cuda events.
Jenkins build this please
Jenkins build this please
Comments have now been addressed and commits are squashed
Jenkins build this please