gpuowl Fixed v6 branch for CUDA 12.0 Nvidia driver

Attempted to fix support for the CUDA 12.0 Nvidia driver by backporting https://github.com/preda/gpuowl/commit/677f43a2ef299f0b8cc9885284fbaa086e917ce2 to the v6 branch.
Fixed support for Clang 14+.
Enabled Link Time Optimization in the Makefile.
Updated the Continuous Integration (CI) with the changes from #252.
- Changed GitHub Actions to also build with Ubuntu 22.04.
- Removed Ubuntu 18.04, which is no longer supported as of April 2023.
Updated GitHub Actions CI.
- Added the Clang-Tidy linter.
- Updated the ShellCheck linter to enable more checks.
- Updated to install the PoCL OpenCL implementation on Linux, so that so the full -h output can be displayed with the FFT lengths. This currently only supports OpenCL 1.2, so it is not yet able to perform any actual tests of GpuOwl besides this help output.
- Fixed the Windows Jobs, including installing the required GMP and OpenCL libraries.
  - For the Windows Clang job, it will now install the MSYS2 version of Clang, instead of using the provided Visual Studio version, which was incompatible with GpuOwl.
  - The MSYS2 "OpenCL ICD Loader" package does not support static linking, so it is not able to use make gpuowl-win.exe to build GpuOwl.
Removed Travis CI.
- It is no longer free for Open Source projects.
Added support for PRP proof powers from 1 up to 12.
- I ported the "best power" feature from Prime95/MPrime, so it will only use up to the optimal proof power for any given exponent. For wavefront exponents, this means it will only support up to proof power 9 for double checks and 10 for first time tests.

Note that this was my attempt to fix the v6 branch, but it does NOT actually fix it:

2023-03-11 11:30:34 gpuowl v6.11-384-gb51191f-dirty
2023-03-11 11:30:34 Note: not found 'config.txt'
2023-03-11 11:30:34 config: -prp 106928347 -iters 100000 -device 0 -cleanup -log 10000 -maxAlloc 13590M 
2023-03-11 11:30:34 device 0, unique id ''
2023-03-11 11:30:35 Tesla T4-0 106928347 FFT: 6M 1K:12:256 (17.00 bpw)
2023-03-11 11:30:35 Tesla T4-0 Expected maximum carry32: 24DB0000
2023-03-11 11:30:36 Tesla T4-0 OpenCL args "-DEXP=106928347u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=12u -DPM1=0 -DWEIGHT_STEP_MINUS_1=0x1.7ddbbaacae2cep-9 -DIWEIGHT_STEP_MINUS_1=-0x1.7cbfc2938b93dp-9  -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
1 warning generated.
2023-03-11 11:30:39 Tesla T4-0 

2023-03-11 11:30:39 Tesla T4-0 OpenCL compilation in 3.29 s
2023-03-11 11:30:42 Tesla T4-0 106928347 OK        0 loaded: blockSize 400, 0000000000000003
2023-03-11 11:30:42 Tesla T4-0 validating proof residues for power 8
2023-03-11 11:30:42 Tesla T4-0 Proof using power 8
2023-03-11 11:30:49 Tesla T4-0 106928347 EE      800   0.00%; 5864 us/it; ETA 7d 06:10; e539a11374d52057 (check 2.49s)
2023-03-11 11:30:52 Tesla T4-0 106928347 OK        0 loaded: blockSize 400, 0000000000000003
2023-03-11 11:30:59 Tesla T4-0 106928347 EE      800   0.00%; 5953 us/it; ETA 7d 08:48; e539a11374d52057 (check 2.55s) 1 errors
2023-03-11 11:31:02 Tesla T4-0 106928347 OK        0 loaded: blockSize 400, 0000000000000003
2023-03-11 11:31:09 Tesla T4-0 106928347 EE      800   0.00%; 6035 us/it; ETA 7d 11:15; e539a11374d52057 (check 2.57s) 2 errors
2023-03-11 11:31:09 Tesla T4-0 3 sequential errors, will stop.
2023-03-11 11:31:09 Tesla T4-0 Exiting because "too many errors"
2023-03-11 11:31:09 Tesla T4-0 Bye

I am not an OpenCL programmer, so I obviously did not correctly resolve the merge conflicts. Any help to finish this PR by correctly applying https://github.com/preda/gpuowl/commit/677f43a2ef299f0b8cc9885284fbaa086e917ce2 to the v6 branch would be greatly appreciated by Colab users and likely other people with Nvidia GPUs. Thanks in advance.

Mar 11 '23 11:03 tdulcet

Is this still actual/useful for merging?

Feb 12 '24 07:02 preda

The main change in this PR of fixing the v6 branch on Nvidia GPUs may still be useful, if you or someone with OpenCL experience were able to finish it. However, OpenCL is completely busted with the latest Nvidia driver, so I am unable to test anything to confirm if it fixes the issue. If/When they do fix their driver, they could fix the original issue as well, eliminating the need to make this change to the v6 branch. We are still patiently waiting to see what Nvidia does...

The other minor changes in the PR, notably fixing Clang support and enabling LTO, are still very useful and should be made to the master branch as well. I was planning to make a separate PR after this was merged.

Feb 12 '24 12:02 tdulcet

It looks like there are too many unrelated changes in this PR; I'm not inclined to merge it as-is.

Maybe some small individual fixes can be extracted as separate PR.

Sep 04 '24 14:09 preda

Yeah, as explained above, this PR is unfinished and does not currently work, which is why it is marked a draft. It has not been a priority to finish either due to OpenCL still being busted with recent Nvidia drivers on Linux.

I could remove the 3d073e09961cedefeb397484f43ab863ac37e824 commit if you were interested in merging the other fixes. Otherwise, I suppose this PR could be closed while we wait for Nvidia...

Sep 05 '24 12:09 tdulcet