Platforms 0.10 and 0.11 have issues with Multi-Arch Daemon builds.
Summary
During a daemon build at platform level 0.10 or 0.11, with a multi-arch run image, the restorer will
- reach out to the repository of the run image to identify the architecture specific digest, and update the analyzed.toml with the sha reference of the targeted architecture.
This is problematic, as
- the run image may be a daemon only image, not even present in the repository it may be tagged for
- the rewrite of the run image reference prevents an image pull policy of 'never' being possible, as the run image reference is rewritten based on information from a remote repo. This may lead to an unexpected run image being selected, and prevents a pull policy of
neverbeing enough to prevent additional image pulls. - even if the platform has pulled the run image using the target architecture, it will not be present in the daemon under the architecture specific sha (when using docker engine api to pull an image with a platform, where the image is multi-arch, the image ends up in the local daemon with the digest of the combined manifest, not that of the architecture specific manifest). Docker engine offers no api to query the architecutre specific digests for a multi-arch image, making it tough for a platform to be able to pull the expected image required for the export step.
- It is unclear for platform 0.10 and 0.11 which builds will need to repull the run image to satisfy the restorer updated digest reference, as the scenarios under which restorer will rewrite the image are not specified. While the behavior has been observed for extension builds using run image switching to a multi-arch run image, it feels likely this affects all daemon builds with multi-arch run images, where restorer updates the run image reference. This should be documented in the platform spec for 0.10 and 0.11, so platform implementers are able to understand when image pulls are required.
Reproduction
Using a platform that pulls the run image after the detect phase has identified the new image during an extension based build using the daemon, with a multi-arch run image target.. note exporter phase will fail because it is unable to locate the arch specific digest referenced run image (currently that error presents itself as a top layer sha issue, as per https://github.com/buildpacks/lifecycle/issues/1456 )
Expected behavior
From 0.12 onwards, restorer has a -daemon flag that stops this behavior, and uses the id of the run image in the daemon instead. This solves all the challenges above. But does not help when using Builders that request platforms 0.10/0.11.
An ideal fix would be to add the -daemon flag to the 0.10 and 0.11 platforms.. if making changes to a platform level after release is allowed.
If the 0.10/0.11 platform specs cannot be changed now, then resolving this becomes much harder. At 0.10 and 0.11, restorer has no knowledge of if the build is for a daemon or not. Perhaps the analyzed.toml could be updated to carry this flag from the analyzed step.. maybe the run image reference could be supplemented with a run image daemon flag, that restorer could use internally as if the -daemon behavior from 0.12 was requested. (and I guess 0.12 and onwards could also honor that flag if -daemon is not passed explicitly)
Either way, it's going to be a behavioral change for 0.10 and 0.11, but arguably the current behavior for multi arch daemon builds at 0.10 and 0.11 is broken, so maybe the change is acceptable.
Context
lifecycle version
tested with 0.20.5, likely exists in every version since multi-arch was added.
platform version(s)
0.10 , 0.11
anything else?
only affects daemon builds with multi-arch run images
the run image may be a daemon only image, not even present in the repository it may be tagged for
I don't understand this part - how did the "wrong" tag get in analyzed.toml in the first place?
the rewrite of the run image reference prevents an image pull policy of 'never' being possible
This could be a bug in the lifecycle. I don't think the restorer should care about the run image if we're not using extensions (for Platform API less than 0.12). However if extensions are present in the group then we require that the run image be a remote image. I believe this is somewhere in the spec
when using docker engine api to pull an image with a platform, where the image is multi-arch, the image ends up in the local daemon with the digest of the combined manifest
You are correct. I believe in pack we are getting around this with an extra image pull, but I would need to confirm
it feels likely this affects all daemon builds with multi-arch run images, where restorer updates the run image reference
The check here helps, but doesn't avoid all cases where this issue might occur
An ideal fix would be to add the -daemon flag to the 0.10 and 0.11 platforms.. if making changes to a platform level after release is allowed.
We have never done this before 😕
the run image may be a daemon only image, not even present in the repository it may be tagged for
I don't understand this part - how did the "wrong" tag get in analyzed.toml in the first place?
If you create a run image, locally in your daemon, then tag that local daemon image so it has registry info (like you would if you were about to push the image to a remote registry).. but do not do that push, then you have a local image in the daemon that is not present in the registry it has been tagged for..
eg..
create local run image called myrunimage:1.0 then do docker image tag myrunimage:1.0 quay.io/stilettos/myrunimage:1.0 and then run a build that refers to a run image of quay.io/stilettos/myrunimage:1.0 (say via an extension switching the run image to that image reference).
Restorer will assume the image is present in the registry from the image tag, and will reach out to the registry and fail, even though the image is already present in the daemon. (I guess also, it's plausible the image could exist remotely but be entirely unrelated to the local image the user is attempting to ensure the build uses, say if they want repeatable builds where all artifacts have already been cached to the daemon before the build is run).
However if extensions are present in the group then we require that the run image be a remote image. I believe this is somewhere in the spec Hmm.. if so, I wonder what the correct handling is for the scenario above, where a local image already exists that differs in content from the remote. Would the user expect the local image to be overwritten? even if pull policy is IF_NOT_PRESENT or NEVER ?
I believe in pack we are getting around this with an extra image pull
Aye, I've got multiple extra pulls within the java platform, at various stages after analyzed.toml gets updates.. with the extra pulls, you can have a build succeed, but realistically only with an effect pull policy of ALWAYS.. once we get to platform 0.12, the problem is resolved because the remote lookup is skipped when using a daemon based image..