[wip] Add serve ha chaos test into nightly test.
Why are these changes needed?
Related issue number
Checks
- [ ] I've signed off every commit(by using the -s flag, i.e.,
git commit -s) in this PR. - [ ] I've run
scripts/format.shto lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
- [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
- [ ] Unit tests
- [ ] Release tests
- [ ] This PR is not tested :(
don't need to be picked to release branch i guess?
don't need to be picked to release branch i guess?
I think at least we need to test it in ray 2.0. I can run the test manually for 2.0.
can you run an example run through: https://buildkite.com/ray-project/release-tests-pr/builds/14266
@simon-mo somehow the job is just hanging there waiting for scheduling. I mannually run it and here is the link to the cluster:
https://console.anyscale.com/o/anyscale-internal/projects/prj_2xR6uT6t7jJuu1aCwWMsle/clusters/ses_m14XtVyBXxdaEyJpGihe6JeX?command-history-section=command_history
I'll wait until it's finished.
@sihanwang41 moving it to a separate file makes sense, but I don't know how to do this with k8s yaml? Do you know how to do this? like one field is loaded from another file.
@sihanwang41 please take another look. This is the best I can figure out. Please let me know if you prefer others.
The tests failure is not related.
I'll monitor this tests for a while to make sure no issues.