torchx
torchx copied to clipboard
[Ray] Add elasticity to jobs launched on ray cluster
Elasticity - the execution of placement groups are pending tasks that will be scheduled by GCS when resources become available.
Related PR: #572
Test plan:
Mock cluster scaling with ray.cluster_utils.
Codecov Report
Merging #580 (ac0e8a9) into main (515a265) will increase coverage by
0.18%. The diff coverage is100.00%.
@@ Coverage Diff @@
## main #580 +/- ##
==========================================
+ Coverage 94.76% 94.94% +0.18%
==========================================
Files 67 67
Lines 4047 4134 +87
==========================================
+ Hits 3835 3925 +90
+ Misses 212 209 -3
| Impacted Files | Coverage Δ | |
|---|---|---|
| torchx/schedulers/ray_scheduler.py | 95.23% <ø> (-0.03%) |
:arrow_down: |
| torchx/components/dist.py | 96.42% <100.00%> (+7.06%) |
:arrow_up: |
| torchx/schedulers/ray/ray_common.py | 100.00% <100.00%> (ø) |
|
| torchx/schedulers/ray/ray_driver.py | 98.27% <100.00%> (+2.44%) |
:arrow_up: |
| torchx/specs/api.py | 98.40% <100.00%> (+<0.01%) |
:arrow_up: |
| torchx/schedulers/kubernetes_scheduler.py | 93.80% <0.00%> (-0.15%) |
:arrow_down: |
| torchx/schedulers/aws_batch_scheduler.py | 89.43% <0.00%> (-0.05%) |
:arrow_down: |
| torchx/cli/cmd_list.py | 100.00% <0.00%> (ø) |
|
| torchx/schedulers/local_scheduler.py | 93.12% <0.00%> (ø) |
|
| ... and 4 more |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
@d4l3k has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.