[ci] Move to new hierarchical docker structure + pipeline
Why are these changes needed?
This PR moves our buildkite pipeline to a new hierarchical structure and will be used with the new buildkite pipeline.
When merging this PR, the old behavior will still work, i.e. the old pipeline is still in place.
After merging this PR, we can build the base images for the master branch, and then switch the CI pipelines to use the new build structure.
Once this switch has been done, the following files will be removed:
-
./buildkite/pipeline.yml- this has been split into pipeline.test.yml and pipeline.build.yml -
./buildkite/Dockerfile- this has been moved (and split) to./ci/docker/ -
./buildkite/Dockerfile.gpu- this has been moved (and split) to./ci/docker/
The new structure is as follow:
-
./ci/dockercontains hierarchical docker files that will be built by the pipeline. -
Dockerfile.base_testcontains common dependencies -
Dockerfile.base_buildinherits from it and adds build-specific dependencies, e.g. llvm, nvm, java -
Dockerfile.base_mlinherits frombase_testand adds ML dependencies, e.g. torch, tensorflow -
Dockerfile.base_gpudepends on a cuda image and otherwise has the same contents asbase_testandbase_mlcombined
In each build, we do the following
-
Dockerfile.buildis built on top ofDockerfile.base_build. Dependencies are re-installed, which is mostly a no-op (except if they changed from when the base image was built) -
Dockerfile.testis built on top ofDockerfile.base_test, and the extracted Ray installation fromDockerfile.buildis injected - The same is true respectively for
mlandgpu.
The pipelines have been split, and a new attribute NO_WHEELS_REQUIRED is added, identifying tests that can be early-started. Early start means that the last available branch image is used and the current code revision is checked out upon it.
See https://github.com/ray-project/buildkite-ci-pipelines/ for the pipeline logic.
Additionally, this PR identified two CI regressions that haven't been caught previously, namely the minimal install tests that didn't properly install the respective Python versions, and some runtime environment tests that don't work with later Ray versions. These should be addressed separately and I'll create issues for them once this PR is merged.
Related issue number
Checks
- [ ] I've signed off every commit(by using the -s flag, i.e.,
git commit -s) in this PR. - [ ] I've run
scripts/format.shto lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
- [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
- [ ] Unit tests
- [ ] Release tests
- [ ] This PR is not tested :(