ray
ray copied to clipboard
[Data] Add heterogeneous Ray Data + Train release test
Why are these changes needed?
- Modifies the existing multi node train benchmark code to enable testing with heterogeneous clusters.
- Adds a new release test
read_images_train_1_gpu_5_cpuwith 1 GPU+5 CPU and runs on an empty model.
Results from test run of the newly added heterogeneous release test (1 GPU, 5 CPU):
cache-none = {'time': 1598.6025146509999, 'tput': 1696.9348484496704, 'extra_metrics': {}}
Related issue number
Checks
- [x] I've signed off every commit(by using the -s flag, i.e.,
git commit -s) in this PR. - [x] I've run
scripts/format.shto lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I added a
method in Tune, I've added it in
doc/source/tune/api/under the corresponding.rstfile.
- [ ] I've added any new APIs to the API Reference. For example, if I added a
method in Tune, I've added it in
- [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
- [ ] Unit tests
- [x] Release tests
- [ ] This PR is not tested :(