img Running multiple `img build` instances concurrently?

Running multiple img build instances concurrently with a single state dir is currently impossible, due to the boltdb lock.

Some solutions I can come up with:

Eliminate boltdb
- Pro: Ideal
- Con: Hard
Internally spawns buildkitd and containerd
- Pro: Easy
- Con: Too much complexity
Define img build-batch as follows
- Pro: Easy
- Con: Weird UX. Still multiple img commands cannot be executed simultaneously.

$ cat << EOF | img build-batch 
-t foo /tmp/foo
-t example.com/bar --target release --push /tmp/bar 
EOF

Any thought?

May 18 '18 05:05 AkihiroSuda

I am all for the easiest solution... so #3 :)

May 21 '18 20:05 jessfraz

Our use case (a docker image ci-build-server) wouldn't benefit from #3 unfortunately.

May 24 '18 17:05 cameronr-nulogy

I like 3, although I'm not if it makes sense for standalone builders to do this work or for higher level pipelines to do them. Also what about: 4. just keep boltdb for now and allow parallel execution using multiple state dirs. Trade off caching for concurrency, for heterogeneous builds it should improve them significantly with low implementation cost.

May 24 '18 18:05 ehotinger

4.1: multiple state dir but with single blob storage dir? (gc might be hard)

May 24 '18 23:05 AkihiroSuda

could re-write the interface to use sqlite which will then allow multiple concurrent processes to open... i might try this in a timebox

actually just realized there are like 4 interfaces that would need that so nope lol

May 31 '18 21:05 jessfraz

Cc @tonistiigi

Jun 01 '18 00:06 AkihiroSuda

It is not the boltdb that causes problems here. That could be easily solved by just releasing the boltdb lock between batches of queries. The real problems start when 2 parallel builds return same cache keys or when gc runs. 2 cache keys are merged internally in the solver and solver also internally keeps track of the snapshot reference counts. If you run prune on one of the builders using same state / boltdb it is likely that the second build will fail with some panic-like error.

I'm all for starting a hidden shared process that does the actual job. If you don't want to call it a daemon for some reason then that is fine. The shared process can go away as soon a there are no main processes running. If there are any changes needed for that in buildkit, then I'm happy to help. Either in combination with getting access to client for a controller directly while having this capability, or something like this in buildctl (or buildkitrun). There are some related ideas also in https://github.com/moby/buildkit/issues/237

Jun 01 '18 01:06 tonistiigi

The shared process can go away as soon a there are no main processes running. If there are any changes needed for that in buildkit, then I'm happy to help.

In current design we need to use containerd worker and spawn containerd for accessing the image store and the content store from the client (img).

This would be ok, but maybe we want to add image store service and content service to buildkitd (with runc worker) as well?

Jun 01 '18 06:06 AkihiroSuda

@AkihiroSuda This has come up before in https://github.com/moby/buildkit/pull/289 . I'm ok with adding it to the buildkitd as it is a set of features atm that only work with a specific worker and that isn't very nice. Another way would be to push the imagestore (referencestore really) to the client and providing a callback for the resolver/prune to check local images.

Jun 01 '18 17:06 tonistiigi

I would propose this be considered out of scope for img. Multiple img builds can be run in separate containers already, and if you want a daemon and scalable builds, I think buildctl would be a better option for those use cases.

Apr 25 '20 07:04 kekoav