Fix ddp_notebook CUDA fork check to allow passive initialization
What does this PR do?
Fixes #21389
This PR fixes the overly strict CUDA fork check in ddp_notebook strategy that was causing false positives in notebook environments like Kaggle.
Problem
The previous implementation used torch.cuda.is_initialized() which returns True even when CUDA is passively initialized (e.g., during library imports, device availability checks, or model loading). This caused the error:
RuntimeError: Lightning can't create new processes if CUDA is already initialized.
This happened even when users didn't explicitly call any CUDA functions, making it impossible to use ddp_notebook in many legitimate scenarios.
Solution
This fix uses PyTorch's internal torch.cuda._is_in_bad_fork() function, which more accurately detects when we're in an actual bad fork state.
The implementation includes a fallback to the old check for older PyTorch versions that don't have _is_in_bad_fork.
Testing
- [x] Code follows style guidelines
- [x] Changes preserve backward compatibility
- [x] Fallback exists for older PyTorch versions
📚 Documentation preview 📚: https://pytorch-lightning--21402.org.readthedocs.build/en/21402/
Codecov Report
:x: Patch coverage is 0% with 11 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 79%. Comparing base (79ffe50) to head (f002d00).
:warning: Report is 13 commits behind head on master.
:white_check_mark: All tests successful. No failed tests found.
:exclamation: There is a different number of reports uploaded between BASE (79ffe50) and HEAD (f002d00). Click for more details.
HEAD has 3345 uploads less than BASE
Flag BASE (79ffe50) HEAD (f002d00) cpu 777 30 lightning_fabric 195 0 pytest 390 0 python3.12 233 9 python3.12.7 232 9 lightning 388 15 python3.11 156 6 python3.10 78 3 python 78 3 pytorch2.1 78 6 pytest-full 387 30 pytorch_lightning 194 15 pytorch2.6 39 3 pytorch2.4.1 38 3 pytorch2.3 39 3 pytorch2.2.2 39 3 pytorch2.5.1 38 3 pytorch2.9 39 3 pytorch2.7 39 3 pytorch2.8 38 3
Additional details and impacted files
@@ Coverage Diff @@
## master #21402 +/- ##
=========================================
- Coverage 87% 79% -8%
=========================================
Files 269 266 -3
Lines 23804 23772 -32
=========================================
- Hits 20626 18730 -1896
- Misses 3178 5042 +1864
Thanks @arrdel , could you update the changelogs? Other than that your PR seems fine to me :)
Thanks @justusschock! I've updated the Fabric changelog as requested. The entry documents the fix for the DDP notebook CUDA fork check to allow passive initialization. The changelog is now complete and ready for review.