Allows container_pull to postpone image download to build phase.
Usage:
container_pull(
name = "base",
registry = "gcr.io",
repository = "my-project/my-base",
digest = "sha256:deadbeef",
lazy_download = True, # Will postpone download to build phase.
)
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: igorgatis
To complete the pull request process, please assign davegay after the PR has been reviewed.
You can assign the PR to them by writing /assign @davegay in a comment when ready.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
/assign @smukherj1 /assign @davegay /assign @eytankidron
We have some large images we pull in in our WORKSPACE, and fetching and creating the intermediate forms even when they're not used is a huge performance cost, so this'd be amazing to see added.
Unfortunately, using the network during an action is considered bad practice and IIRC network access is disabled in many execution contexts, particularly remote execution.
We can consult with the core bazel team but I don't think this is going to work as-is.
genrule(
name = "download",
message = "Fetching",
outs = ["{outs}"],
cmd = "{args}",
tools = ["{tool}"],
)
Also, what are the security implications of this change? Is it possible for an attacker to replace content in an image by deferring download to the build phase? Is hash integrity preserved?
@ulfjack any comment here?
Unfortunately, using the network during an action is considered bad practice and IIRC network access is disabled in many execution contexts, particularly remote execution.
I believe everybody agrees with that. However, the "best practice" makes the scenario of multiple container_pull calls in WORKSPACE impractical.
This is why the field lazy_download flag is false by default. User needs to be aware of pros-and-cons. Perhaps, user might make use of this flag only for rarely used images or HUGE images.
Also, what are the security implications of this change? Is it possible for an attacker to replace content in an image by deferring download to the build phase? Is hash integrity preserved?
We can mitigate this issue passing expected SHA to genrule call. It should be pretty easy to implement. Should I proceed with that?
https://github.com/bazelbuild/rules_python#fetch-pip-dependencies-lazily-experimental
This might be a relevant reference
Many organizations use an HTTP_CACHE or artifactory to mitigate this. Can you not use that?
Not every organization has the flexibility to stand something like that up and maintain it. There's still a large startup cost in bigger repos when having to download all images when one or none will be used. I think with sufficient documentation this would be a fantastic feature.
Perhaps we should discuss at next maintainer meeting. @igorgatis @sluongng @UebelAndre would you be willing to join a call in a few weeks to go over this?
Perhaps we should discuss at next maintainer meeting. @igorgatis @sluongng @UebelAndre would you be willing to join a call in a few weeks to go over this?
For sure. Please send and invite to sluongng at gmail dot com.
What is the conclusion on this? It is a must have for polyglot monorepos and as stated above, has already been implemented in rules_python so there is a precedent for it. I can assess that the patch works perfectly well (thanks @igorgatis btw!) and would really love this to be upstreamed.
@igorgatis could you fix Buildifier error:
##### :bazel: buildifier: found 1 format issue in your WORKSPACE, BUILD and *.bzl files
--
| Please download <a href="https://github.com/bazelbuild/buildtools/releases/tag/0.22.0">buildifier 0.22.0</a> and run the following command in your workspace:<br/><pre><code>buildifier container/pull.bzl</code></pre>
| ##### :bazel: buildifier: found 1 lint issue in your WORKSPACE, BUILD and *.bzl files
Another interesting thing is compatibility with private registries, for example docker-credential-gcloud helper wants writable $HOME/.config/gcloud/logs/ and probably read access for cached tokens. Maybe "no-sandbox" (and maybe also "local") tag is required to make it work for private registries with credential helpers.