Shared lock file
Hello friends!
First, I want to thank you for your work on this so far. It's a really nice little way to put together alpine containers and leverage the developing wolfi ecosystem.
I was hoping to get some perspective on what a good pattern might be for sharing a lockfile between multiple instantiations of apko_image. In a larger repository, where I'd want to create multiple images, there's going to be a substantial number of translate_apko_lock calls within my MODULE file, each of which are going to manage a separate cache of these (potentially common) dependencies. What would be nice, is if I could have a single lock file that reflected a snapshot of dependencies from my upstream, which could then be shared between multiple different apko_images. In this model, I'm able to ensure that multiple containers are using the same version of some dependencies (and therefore upgrade together), and only need to manage one lockfile. This also makes it easier if I would like to wrap the apko_image in a macro or rule, knowing that I wouldn't have to modify the MODULE as well.
I think I could hack this right now, by having a beefy config file that contained all expected dependencies from the upstream, locking that config, and then passing it into translate_apko_lock so it could then be used as the contents for all my apko_images, but that seems a little inelegant. I see how this could maybe reflected as a custom rule, but it would require access to the private apko_run rule.
Would love your thoughts on whether or not this is a common usecase, and if so, what tweaks could be made to make if possible. I'm happy to assist in a PR.
Thanks!
I think I could hack this right now, by having a beefy config file that contained all expected dependencies from the upstream, locking that config, and then passing it into translate_apko_lock so it could then be used as the contents for all my apko_images, but that seems a little inelegant.
Having just tried this, note it doesn't work with the current apko/rules_apko tools. It seems the lockfile is treated as the actual packages to install, and the list in apko.yaml is silently ignored (except for some misleading informational output).
Just wanted to dump some thoughts on current implementation and why avoiding repository_rule per image might be unavoidable. Just a disclaimer - It's likely some information here is not correct :D
-
repository_ruleis a way to download the remote contents in bazel (in the end most of the calls end up with rctx.download) - When any of the file in the repository is referenced, then whole repository is downloaded.
- translate_apko_lock generates multiple repositories
-
primary: The one that contains the
contentstarget andapko_repositoriesmacro, which declares all secondary repository rules - secondaries: For each package there is a separate repository_rule, which handles download of the single package and exposes target for it.
-
primary: The one that contains the
- The
contentstarget must declare all packages' targets as dependencies. This is done by generating a BUILD.bazel in the primary repository of translate_apko_lock - Dependencies must be declared explicitly for analysis phase
- To get the list of dependencies you need to read the lockfile, which for build rule can only happen in execution phase. On the other hand, repository rule can read the file
There could be a model where there is a one big translate_apko_lock that contains all of the packages' targets, but lockfile is provided separately to the apko_image
apko_image(
contents = all_packages_for_all_images
lockfile = lockfile_for_specific_image
)
But then building a single image would download packages needed for all images in your repo. If your usecase is bazel build //... then it doesn't matter much (except much more symlinks that will be added for each apko_image rule) as eventually you would download all the packages anyway. But as the generic solution this would be wasteful.
Note about the repository cache: it is content addressable cache by hashes, so even if multiple lockfiles contain the same package, it will be downloaded only once.
Sorry to necro this issue, but I've been tinkering on this some lately.
I think this is possible - but would require either a different rule, or some heavy modification to apko_image. I think the currently implementation is kind of restricted by the fact we're referencing an apko file inside the repository, instead of building one dynamically inside the rule.
Here's a high level on what I'm thinking. We can modify translate lock to add the following:
- An additional build file for each included package, under a directory with the package name as it appears in the apko file.
- Each of these BUILD files contains a small wrapper rule around the contents of the APK, and a provider we can use to pass additional information (like the architecture, version, etc).
- The root BUILD file in the repository has a target added, exposing the signing keys.
This should lay the ground work so we can collect specific packages, and our keys from the repo. This would also provide specific targets we can refer to as packages within a rule. Then within our image constructing rule we can do the following:
- Consume a list of packages from our attributes, and use them to build a local repository, these could be mostly symlinks like the current cache implementation.
- The one thing I'm still a little fuzzy on is the APKINDEX - my understanding is that the lockfile produces a singular instance of each package. I think we could produce an index representing that lockfile and "just" bundle that in with the local repo.
- Generate an apko config file dynamically, including the paths to our signing keys, and a singular repository pointing a directory in our workspace containing symlinks to the packages referenced in the build target.
- Run apko
I'm skipping over plenty of details here, but I think this points to a method where we could have a singular config used to generate our lockfile, and use that to build multiple packages from a shared inventory.
Some of this may sound involved, but I don't think is too difficult. I've been playing for the last day or two, and think I'm close to a working implementation. If it sounds amenable, I'd be happy to clean it up for a contribution.
I'd really appreciate your thoughts @sfc-gh-mhazy on whether this is sound. As well as whether my assumption about the lockfile is correct.
Consume a list of packages from our attributes, and use them to build a local repository, these could be mostly symlinks like the current cache implementation.
The difficulty here is that in apko.yaml or any input format, you provide the packages you want. But there are also transitive dependencies. For example apko config lists bash. Actual list of packages is bash, glibc etc.
In bazel each rule needs to explicitly list all dependencies. So you cannot dynamically depend on transitive dependencies.
So while it is possible to have a shared locked file, which in extreme is a locked representation of whole package repository, we still need to find and declare transitive closure of packages needed for the given image.
For instance, rules_python parse requirements lock file. The transitive pip dependencies are put in deps of direct dependencies. This way, if your py_binarh depends, for instance on pytorch, it is enough to declare pytorch as dependency.
In apk resolution it is not that simple - the transitive dependencies may change, depending on direct packages that you want.
For instance, there can be
- package A1 that provides a
- package A2 that provides a
- package X that !A1 and requires a
- package Y that !A2 and requires a
- package Z that requires a
If you want to install X and Z, transitive closure is (X,Z,A2) If you want to install Y and Z, transitive closure is (Y,Z,A1)
I don’t know other way to dynamically generate list of dependencies other than repository rule. So we would still probably need repository rule per image.
Perhaps same approach as in rules python could be used in apko as well, you can use single requirements file for building multiple binaries. But you would need to build dependency graph of packages and express that in generated targets in translate apko lock.
I think that's a fair call-out, I wasn't aware that the spec allowed for the mutual exclusion of packages (I'm guessing that's what you mean by !A1? please correct me if I'm wrong). If you have an example handy, I'd love to see that in the spec or an actual index.
It would involve bringing along potentially unnecessary content, but I suppose one solution could be to mark all potential providers for a package as dependencies, and then leave apko to resolve the right set from the index at build time.
Also, on some reflection, can we generate a lockfile if all the packages listed in the config file can't be installed together? My assumption is that an apko file will only allow the build if it can successfully resolve and install all of them. If that's the case, then any subset of those listed packages should also be resolvable, and we've made some determination as to what an appropriate dependency chain is. Not being able to handle mutually exclusive packages would be a limitation, but I honestly don't know how severe that would be for common usecases.
You are right that ! constraint is rather an edge case. In wolfi I was able to find only https://github.com/wolfi-dev/os/blob/b1ec5c78802b73b35d815d3433489fded60a326b/glibc.yaml#L17 but this is just to prevent from mixing with alpine.
I guess my point was more that in theory, due to various complexities with APKINDEX spec, having universal lock may be impossible. That said I agree that it would be useful to have it for most cases.
Ad transitive dependencies:
- There is a small „chicken and egg problem”. When building with apko with lockfile, the resolution does not happen at all. It just installs list of packages. The resolution happens when you run apko lock.
- IIUC, you want to now offload the resolution to build phase and make it based on local apkindex. This is in principle good idea, but in bazel you need to provide all the dependencies before building.
- You are correct that apko lock will work only if it can satisfy all the requirements. Currently it just returns list of packages to install. If we could have a more structured lock, that tell why the package was added. We could build a dependency graph in translate apko lock.
- I think the providers idea is also feasible, but potentially heavy on rules apko side: IIUC you want to parse APKINDEX to find out which packages provide requirements for the given package and put it in deps?
Thanks for helping me explore the problem @matiwertyl2!
I agree that a universal lock wouldn't work, but I think we're getting close to something that would work for most teams. For my understanding, is the mutual exclusion requirement something that's native to apk, or an idea introduced by apko? Looking through the spec, I'm able to find concepts like replaces or install_if, but can't find anything specific for mutual exclusion.
I think we're on the same page. As far as I see it, there are two possible ways to construct transitive dependency information:
- We enhance the lockfile to include the resolved dependencies for each package, this would make the problem very easy.
- We parse the APKINDEX files to find what each package supplies and provides (or in a world where we can do whatever we want, add them to the lockfile), then mark any potential provider for a package as a dependency, provided it is also in the lockfile and matches the architecture of the package.
- This would allow apko to determine the right packages to pull in to the image at buildtime.
- I agree this seems heavy on the rules side, it'd be a little bit of a pain to implement.
- We could potentially be pulling in unnecessary dependencies at build time. That said, it feels unlikely that the rendered lockfile would include two packages that provide the same thing. Is that possible?
On the chicken and egg problem, I'm not sure we actually need to be building with lockfile, in my eyes the biggest benefit of the lockfile is that it guarantees a set of packges from our repositories that are all satisfied within the set - which gives us our own repository. We can build by representing the repository on disk. We can either:
- Build a new index file for each architecture.
- This would be a pain since we'd be right back to parsing the existing APKINDEX files to do it.
- In a world where we're looking at lockfile v2, it'd be great if we also included the index information with each package so we could add a "build index file" subcommand.
- Take the existing indices, put them in separate directories, and decide which packages live alongside them by matching the URLs and architectures.
- In our dynamic config, we'd then have to tag these appropriately based on our URL match.
- This is totally fine - but feels a little ugly with the URL matching.