ray icon indicating copy to clipboard operation
ray copied to clipboard

[ci] Move to new hierarchical docker structure + pipeline

Open krfricke opened this issue 3 years ago • 0 comments

Why are these changes needed?

This PR moves our buildkite pipeline to a new hierarchical structure and will be used with the new buildkite pipeline.

When merging this PR, the old behavior will still work, i.e. the old pipeline is still in place.

After merging this PR, we can build the base images for the master branch, and then switch the CI pipelines to use the new build structure.

Once this switch has been done, the following files will be removed:

  • ./buildkite/pipeline.yml - this has been split into pipeline.test.yml and pipeline.build.yml
  • ./buildkite/Dockerfile - this has been moved (and split) to ./ci/docker/
  • ./buildkite/Dockerfile.gpu - this has been moved (and split) to ./ci/docker/

The new structure is as follow:

  • ./ci/docker contains hierarchical docker files that will be built by the pipeline.
  • Dockerfile.base_test contains common dependencies
  • Dockerfile.base_build inherits from it and adds build-specific dependencies, e.g. llvm, nvm, java
  • Dockerfile.base_ml inherits from base_test and adds ML dependencies, e.g. torch, tensorflow
  • Dockerfile.base_gpu depends on a cuda image and otherwise has the same contents as base_test and base_ml combined

In each build, we do the following

  • Dockerfile.build is built on top of Dockerfile.base_build. Dependencies are re-installed, which is mostly a no-op (except if they changed from when the base image was built)
  • Dockerfile.test is built on top of Dockerfile.base_test, and the extracted Ray installation fromDockerfile.build is injected
  • The same is true respectively for ml and gpu.

The pipelines have been split, and a new attribute NO_WHEELS_REQUIRED is added, identifying tests that can be early-started. Early start means that the last available branch image is used and the current code revision is checked out upon it.

See https://github.com/ray-project/buildkite-ci-pipelines/ for the pipeline logic.

Additionally, this PR identified two CI regressions that haven't been caught previously, namely the minimal install tests that didn't properly install the respective Python versions, and some runtime environment tests that don't work with later Ray versions. These should be addressed separately and I'll create issues for them once this PR is merged.

Related issue number

Checks

  • [ ] I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • [ ] I've run scripts/format.sh to lint the changes in this PR.
  • [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
  • [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • [ ] Unit tests
    • [ ] Release tests
    • [ ] This PR is not tested :(

krfricke avatar Sep 20 '22 14:09 krfricke