DeepSpeed remove `torch.cuda.is_available()` check when compiling ops

torch.cuda.is_available() is not necessary here. And I would cause https://github.com/microsoft/DeepSpeed/issues/2858 when compiling deepspeed >= 0.8.1 on a machine without gpu (e.g. docker image build).

Mar 23 '23 04:03 jinzhen-lin

@jinzhen-lin, thanks for your contribution. But can you please provide some more details on the issue fixed by this PR? In my experience, the commented code works fine on machines without gpu including this CI. Thanks!

Mar 24 '23 11:03 tjruwase

@tjruwase Exactly I mean compiling cuda ops on a machine without gpu. But the CI doesn't build ops.

In the mentioned issue, we encountered an error since the quantizer op (introduced in v0.8.1) need cuda half operators, but the compilation arguments -D__CUDA_NO_HALF_OPERATORS__, -D__CUDA_NO_HALF_CONVERSIONS__, -D__CUDA_NO_BFLOAT16_CONVERSIONS__, and -D__CUDA_NO_HALF2_OPERATORS__ are set. (You can search those arguments on the mentioned issue page).

So we need those nvcc arguments : https://github.com/microsoft/DeepSpeed/blob/258d283181120c05fab01f2727461db19e664e8b/op_builder/builder.py#L687-L695

But those arguments are ignored since we do the cuda check here https://github.com/microsoft/DeepSpeed/blob/258d283181120c05fab01f2727461db19e664e8b/op_builder/builder.py#L622-L631

The cuda check doesn't pass since we cannot get the true cuda version with installed_cuda_version. With this PR, we get the true cuda version and the issue should be fixed.

I think installed_cuda_version should always return the installed cuda toolkit version on the system, it should work even on a machine without gpu but with cudatoolkit.

Mar 24 '23 12:03 jinzhen-lin

@jinzhen-lin, thanks for your helpful explanation. It seems the problem is that we assume that build and target environments are the same. We recently started enabling DeepSpeed for CPU-only target environments, and we distinguish from GPU target environments by testing for GPU availability using torch.cuda.is_available(). It is now clear that our approach does not work for your scenario where you are building CUDA OPs in environment with CUDA libraries but no GPUs. The problem with this PR is that it will break builds for CPU-only environments. It seems a more robust solution is cross-compilation, and a key challenge would be enabling users to conveniently specify the target environments, implicitly or explicitly.

Please share your thoughts on this. Thanks!

@jeffra, @mrwyattii FYI

Mar 24 '23 18:03 tjruwase

@tjruwase Sorry for absence of cpu builds checking before PR.

I notice that the cpu-only target environments was introduced recently (after v0.8.0) and deepspeed is mainly for gpu now. So we should always assume user want a cuda build, and we should do a cpu build when:

we cannot get the cuda in the build environment or cuda version is incompatible with torch cuda version
user specify a environment variable (e.g. DS_BUILD_OPS_CPU)

Mar 27 '23 04:03 jinzhen-lin

@microsoft-github-policy-service agree

Mar 27 '23 13:03 jinzhen-lin

@jinzhen-lin, thanks for updating the PR. This is an improvement but not quite cross-compilation. Nevertheless, this will suffice for now.

Apr 17 '23 20:04 tjruwase