rules_docker icon indicating copy to clipboard operation
rules_docker copied to clipboard

Allows container_pull to postpone image download to build phase.

Open igorgatis opened this issue 5 years ago • 13 comments

Usage:

container_pull(
    name = "base",
    registry = "gcr.io",
    repository = "my-project/my-base",
    digest = "sha256:deadbeef",
    lazy_download = True, # Will postpone download to build phase.
)

igorgatis avatar Mar 08 '21 17:03 igorgatis

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: igorgatis To complete the pull request process, please assign davegay after the PR has been reviewed. You can assign the PR to them by writing /assign @davegay in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot avatar Mar 08 '21 17:03 k8s-ci-robot

/assign @smukherj1 /assign @davegay /assign @eytankidron

igorgatis avatar Mar 08 '21 18:03 igorgatis

We have some large images we pull in in our WORKSPACE, and fetching and creating the intermediate forms even when they're not used is a huge performance cost, so this'd be amazing to see added.

zoidyzoidzoid avatar Mar 10 '21 23:03 zoidyzoidzoid

Unfortunately, using the network during an action is considered bad practice and IIRC network access is disabled in many execution contexts, particularly remote execution.

We can consult with the core bazel team but I don't think this is going to work as-is.

genrule(
    name = "download",
    message = "Fetching",
    outs = ["{outs}"],
    cmd = "{args}",
    tools = ["{tool}"],
)

Also, what are the security implications of this change? Is it possible for an attacker to replace content in an image by deferring download to the build phase? Is hash integrity preserved?

@ulfjack any comment here?

pcj avatar Mar 17 '21 00:03 pcj

Unfortunately, using the network during an action is considered bad practice and IIRC network access is disabled in many execution contexts, particularly remote execution.

I believe everybody agrees with that. However, the "best practice" makes the scenario of multiple container_pull calls in WORKSPACE impractical.

This is why the field lazy_download flag is false by default. User needs to be aware of pros-and-cons. Perhaps, user might make use of this flag only for rarely used images or HUGE images.

Also, what are the security implications of this change? Is it possible for an attacker to replace content in an image by deferring download to the build phase? Is hash integrity preserved?

We can mitigate this issue passing expected SHA to genrule call. It should be pretty easy to implement. Should I proceed with that?

igorgatis avatar Mar 22 '21 20:03 igorgatis

https://github.com/bazelbuild/rules_python#fetch-pip-dependencies-lazily-experimental

This might be a relevant reference

sluongng avatar Mar 27 '21 08:03 sluongng

Many organizations use an HTTP_CACHE or artifactory to mitigate this. Can you not use that?

pcj avatar Mar 29 '21 18:03 pcj

Not every organization has the flexibility to stand something like that up and maintain it. There's still a large startup cost in bigger repos when having to download all images when one or none will be used. I think with sufficient documentation this would be a fantastic feature.

UebelAndre avatar Apr 02 '21 14:04 UebelAndre

Perhaps we should discuss at next maintainer meeting. @igorgatis @sluongng @UebelAndre would you be willing to join a call in a few weeks to go over this?

pcj avatar Apr 02 '21 15:04 pcj

Perhaps we should discuss at next maintainer meeting. @igorgatis @sluongng @UebelAndre would you be willing to join a call in a few weeks to go over this?

For sure. Please send and invite to sluongng at gmail dot com.

sluongng avatar Apr 02 '21 15:04 sluongng

What is the conclusion on this? It is a must have for polyglot monorepos and as stated above, has already been implemented in rules_python so there is a precedent for it. I can assess that the patch works perfectly well (thanks @igorgatis btw!) and would really love this to be upstreamed.

thekyz avatar Oct 07 '21 18:10 thekyz

@igorgatis could you fix Buildifier error:

##### :bazel: buildifier: found 1 format issue in your WORKSPACE, BUILD and *.bzl files
--
  | Please download <a href="https://github.com/bazelbuild/buildtools/releases/tag/0.22.0">buildifier 0.22.0</a> and run the following command in your workspace:<br/><pre><code>buildifier container/pull.bzl</code></pre>
  | ##### :bazel: buildifier: found 1 lint issue in your WORKSPACE, BUILD and *.bzl files

mindaugasrukas avatar Jan 27 '22 21:01 mindaugasrukas

Another interesting thing is compatibility with private registries, for example docker-credential-gcloud helper wants writable $HOME/.config/gcloud/logs/ and probably read access for cached tokens. Maybe "no-sandbox" (and maybe also "local") tag is required to make it work for private registries with credential helpers.

dmivankov avatar Oct 31 '22 23:10 dmivankov