[FEA] Make selector choose appropriate CUDA 12.x versions based on dependencies
Is your feature request related to a problem? Please describe. Once RAPIDS adds support for CUDA 12.2, it will be possible to install conda packages of PyTorch along with RAPIDS from conda. Currently this is not possible because PyTorch supports 12.1 and will likely bump straight to 12.3 for their next set of packages. Since the CUDA 12 lineup of RAPIDS packages is going to leverage CEC to support arbitrary CUDA minor versions, we will no longer need users to have a specific one for RAPIDS, but dependencies like PyTorch will likely continue to do so.
Describe the solution you'd like We should update the release selector to include a range of CUDA minor versions and have it automatically select supported ones based on the user's choice of packages to include in their environment.
Additional context
For libraries like PyTorch, we will also need to consider what channel the package will be installed from. Officially supported PyTorch builds come from the pytorch channel, not conda-forge, so unless/until that changes we will need to ensure that our install command accounts for that correctly.
Possibly related ( https://github.com/rapidsai/docs/pull/470 )
#470 fixes the compatible major versions of CUDA for the TensorFlow GPU conda-forge package. It does not impact minor version compatibility.
What part of this is dependent on RAPIDS supporting CUDA 12.2?
I was able to solve this environment, and got a CUDA 12 build of pytorch from conda-forge (pytorch 2.1.2 cuda120_py310h327d3bc_301).
mamba create -n rapids-23.12 -c rapidsai -c conda-forge -c nvidia rapids=23.12 python=3.10 cuda-version=12.0 pytorch
I don't think we can offer official compatibility between RAPIDS / conda-forge and the pytorch channel, given that the pytorch package from the pytorch channel is built against nvidia channel CUDA packages. These channel conflicts are unavoidable. An example environment showing the mixture of nvidia and conda-forge packages can be generated by adding -c pytorch before -c conda-forge:
# Uses both nvidia and conda-forge CUDA Toolkit packages. Not supported.
mamba create -n rapids-23.12 -c rapidsai -c pytorch -c conda-forge -c nvidia rapids=23.12 python=3.10 cuda-version=12.0 pytorch
Last I tested it, this environment worked but we can't offer support for a configuration with CUDA from a mixed set of channels.
At some point in the future we are hoping to make the CUDA distributions on the nvidia and conda-forge channels compatible, but until that point, I don't see any action item here. The install selector works as desired with PyTorch CUDA 12 packages from conda-forge.
I agree that this isn't addressable until the nvidia and conda-forge CTK packages are aligned. We should consider how the selector ought to work once that day comes, though. To @MatthiasKohl's point, though, the pytorch channel is the officially supported medium (by both NVIDIA and PyTorch) for installing the package, so IMHO once the two are aligned we would probably want to encourage installation of PyTorch from the pytorch channel unless and until we see a similar level of support for the conda-forge package as NVIDIA is now providing for the CTK on cf.
The install selector works as desired with PyTorch CUDA 12 packages from conda-forge.
It might work as desired, but I don't think it should. I checked today with Cliff and Piotr from DLFW, and both our DLFW teams and upstream pytorch have found many incompatibility issues with the pytorch build from conda-forge, e.g. libc version and so on. The problem is that few people install only pytorch and rely on many other packages, which are all either pip-wheel based or based on conda's main channel, and use different base packages. IMO, we should not encourage people to use this pytorch build. If RAPIDS cannot be compatible with upstream pytorch (from officially supported channels), then we should either work with DLFW to become compatible, or remove that option from the install selector.